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PREFACE 


This book and its companion volume Advanced Real Analysis systematically 
develop concepts and tools in real analysis that are vital to every mathematician, 
whether pure or applied, aspiring or established. The two books together contain 
what the young mathematician needs to know about real analysis in order to 
communicate well with colleagues in all branches of mathematics. 

The books are written as textbooks, and their primary audience is students who 
are learning the material for the first time and who are planning a career in which 
they will use advanced mathematics professionally. Much of the material in the 
books corresponds to normal course work. Nevertheless, it is often the case that 
core mathematics curricula, time-limited as they are, do not include all the topics 
that one might like. Thus the book includes important topics that may be skipped 
in required courses but that the professional mathematician will ultimately want 
to learn by self-study. 

The content of the required courses at each university reflects expectations of 
what students need before beginning specialized study and work on athesis. These 
expectations vary from country to country and from university to university. Even 
so, there seems to be a rough consensus about what mathematics a plenary lecturer 
at a broad international or national meeting may take as known by the audience. 
The tables of contents of the two books represent my own understanding of what 
that degree of knowledge is for real analysis today. 


Key topics and features of Basic Real Analysis are as follows: 


e Early chapters treat the fundamentals of real variables, sequences and series 
of functions, the theory of Fourier series for the Riemann integral, metric 
spaces, and the theoretical underpinnings of multivariable calculus and ordi- 
nary differential equations. 

e Subsequent chapters develop the Lebesgue theory in Euclidean and abstract 
spaces, Fourier series and the Fourier transform for the Lebesgue integral, 
point-set topology, measure theory in locally compact Hausdorff spaces, and 
the basics of Hilbert and Banach spaces. 

e The subjects of Fourier series and harmonic functions are used as recurring 
motivation for a number of theoretical developments. 

e The development proceeds from the particular to the general, often introducing 
examples well before a theory that incorporates them. 


xi 
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e More than 300 problems at the ends of chapters illuminate aspects of the 
text, develop related topics, and point to additional applications. A separate 
55-page section “Hints for Solutions of Problems” at the end of the book gives 
detailed hints for most of the problems, together with complete solutions for 
many. 


Beyond a standard calculus sequence in one and several variables, the most 
important prerequisite for using Basic Real Analysis is that the reader already 
know what a proof is, how to read a proof, and how to write a proof. This 
knowledge typically is obtained from honors calculus courses, or from a course 
in linear algebra, or from a first junior-senior course in real variables. In addition, 
itis assumed that the reader is comfortable with a modest amount of linear algebra, 
including row reduction of matrices, vector spaces and bases, and the associated 
geometry. A passing acquaintance with the notions of group, subgroup, and 
quotient is helpful as well. 

Chapters I-IV are appropriate for a single rigorous real-variables course and 
may be used in either of two ways. For students who have learned about proofs 
from honors calculus or linear algebra, these chapters offer a full treatment of real 
variables, leaving out only the more familiar parts near the beginning —such as 
elementary manipulations with limits, convergence tests for infinite series with 
positive scalar terms, and routine facts about continuity and differentiability. For 
students who have learned about proofs from a first junior-senior course in real 
variables, these chapters are appropriate for a second such course that begins with 
Riemann integration and sequences and series of functions; in this case the first 
section of Chapter I will be a review of some of the more difficult foundational 
theorems, and the course can conclude with an introduction to the Lebesgue 
integral from Chapter V if time permits. 

Chapters V through XII treat Lebesgue integration in various settings, as well 
as introductions to the Euclidean Fourier transform and to functional analysis. 
Typically this material is taught at the graduate level in the United States, fre- 
quently in one of three ways: The first way does Lebesgue integration in Euclidean 
and abstract settings and goes on to consider the Euclidean Fourier transform in 
some detail; this corresponds to Chapters V—VIII. A second way does Lebesgue 
integration in Euclidean and abstract settings, treats L? spaces and integration on 
locally compact Hausdorff spaces, and concludes with an introduction to Hilbert 
and Banach spaces; this corresponds to Chapters V—VII, part of IX, and XI—-XII. 
A third way combines an introduction to the Lebesgue integral and the Euclidean 
Fourier transform with some of the subject of partial differential equations; this 
corresponds to some portion of Chapters V—VI and VIII, followed by chapters 
from the companion volume Advanced Real Analysis. 

In my own teaching, I have most often built one course around Chapters I-IV 
and another around Chapters V—VII, part of IX, and XI-XII. I have normally 
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assigned the easier sections of Chapters II and X as outside reading, indicating 
the date when the lectures would begin to use that material. 

More detailed information about how the book may be used with courses may 
be deduced from the chart “Dependence among Chapters” on page xiv and the 
section “Guide to the Reader” on pages xv—xvii. 

The problems at the ends of chapters are an important part of the book. Some 
of them are really theorems, some are examples showing the degree to which 
hypotheses can be stretched, and a few are just exercises. The reader gets no 
indication which problems are of which type, nor of which ones are relatively 
easy. Each problem can be solved with tools developed up to that point in the 
book, plus any additional prerequisites that are noted. 


Two omissions from the pair of books are of note. One is any treatment of 
Stokes’s Theorem and differential forms. Although there is some advantage, 
when studying these topics, in having the Lebesgue integral available and in 
having developed an attitude that integration can be defined by means of suitable 
linear functionals, the topic of Stokes’s Theorem seems to fit better in a book 
about geometry and topology, rather than in a book about real analysis. 

The other omission concerns the use of complex analysis. It is tempting to try 
to combine real analysis and complex analysis into a single subject, but my own 
experience is that this combination does not work well at the level of Basic Real 
Analysis, only at the level of Advanced Real Analysis. 

Almost all of the mathematics in the two books is at least forty years old, and I 
make no claim that any result is new. The books are a distillation of lecture notes 
from a 35-year period of my own learning and teaching. Sometimes a problem at 
the end of a chapter or an approach to the exposition may not be a standard one, 
but no attempt has been made to identify such problems and approaches. In the 
reverse direction it is possible that my early lecture notes have directly quoted 
some source without proper attribution. As an attempt to rectify any difficulties 
of this kind, I have included a section of “Acknowledgements” on pages xix—xx 
of this volume to identify the main sources, as far as I can reconstruct them, for 
those original lecture notes. 

Iam grateful to Ann Kostant and Steven Krantz for encouraging this project and 
for making many suggestions about pursuing it, and to Susan Knapp and David 
Kramer for helping with the readability. The typesetting was by AjyS-TEX, and 
the figures were drawn with Mathematica. 

I invite corrections and other comments from readers. I plan to maintain a list 
of known corrections on my own Web page. 

A. W. KNAPP 
May 2005 


DEPENDENCE AMONG CHAPTERS 


Below is a chart of the main lines of dependence of chapters on prior chapters. 
The dashed lines indicate helpful motivation but no logical dependence. Apart 
from that, particular examples may make use of information from earlier chapters 
that is not indicated by the chart. 


I, I, II in order | 


GUIDE FOR THE READER 


This section is intended to help the reader find out what parts of each chapter are 
most important and how the chapters are interrelated. Further information of this 
kind is contained in the abstracts that begin each of the chapters. 

The book pays attention to certain recurring themes in real analysis, allowing 
a person to see how these themes arise in increasingly sophisticated ways. Ex- 
amples are the role of interchanges of limits in theorems, the need for certain 
explicit formulas in the foundations of subject areas, the role of compactness and 
completeness in existence theorems, and the approach of handling nice functions 
first and then passing to general functions. 

All of these themes are introduced in Chapter I, and already at that stage they 
interact in subtle ways. For example, a natural investigation of interchanges of 
limits in Sections 2-3 leads to the discovery of Ascoli’s Theorem, which is a 
fundamental compactness tool for proving existence results. Ascoli’s Theorem 
is proved by the “Cantor diagonal process,’ which has other applications to 
compactness questions and does not get fully explained until Chapter X. The 
consequence is that, no matter where in the book a reader plans to start, everyone 
will be helped by at least leafing through Chapter I. 


The remainder of this section is an overview of individual chapters and groups 
of chapters. 

Chapter I. Every section of this chapter plays a role in setting up matters 
for later chapters. No knowledge of metric spaces is assumed anywhere in the 
chapter. Section | will be a review for anyone who has already had a course in real- 
variable theory; the section shows how compactness and completeness address 
all the difficult theorems whose proofs are often skipped in calculus. Section 2 
begins the development of real-variable theory at the point of sequences and series 
of functions. It contains interchange results that turn out to be special cases of 
the main theorems of Chapter V. Sections 8—9 introduce the approach of handling 
nice functions before general functions, and Section 10 introduces Fourier series, 
which provided a great deal of motivation historically for the development of real 
analysis and are used in this book in that same way. Fourier series are somewhat 
limited in the setting of Chapter I because one encounters no class of functions, 
other than infinitely differentiable ones, that corresponds exactly to some class of 
Fourier coefficients; as a result Fourier series, with Riemann integration in use, 
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are not particularly useful for constructing new functions from old ones. This 
defect will be fixed with the aid of the Lebesgue integral in Chapter VI. 


Chapter II. Now that continuity and convergence have been addressed on 
the line, this chapter establishes a framework for these questions in higher- 
dimensional Euclidean space and other settings. There is no point in ad hoc 
definitions for each setting, and metric spaces handle many such settings at once. 
Chapter X later will enlarge the framework from metric spaces to “topological 
spaces.” Sections 1-6 of Chapter II are routine. Section 7, on compactness 
and completeness, is the core. The Baire Category Theorem in Section 9 is not 
used outside of Chapter II until Chapter XII, and it may therefore be skipped 
temporarily. Section 10 contains the Stone—Weierstrass Theorem, which is a 
fundamental approximation tool. Section 11 is used in some of the problems but 
is not otherwise used in the book. 

Chapter III. This chapter does for the several-variable theory what Chapter I 
has done for the one-variable theory. The main results are the Inverse and Implicit 
Function Theorems in Section 6 and the change-of-variables formula for multiple 
integrals in Section 10. The change-of-variables formula has to be regarded as 
only a preliminary version, since what it directly accomplishes for the change 
to polar coordinates still needs supplementing; this difficulty will be repaired in 
Chapter VI with the aid of the Lebesgue integral. Section 4, on exponentials 
of matrices, may be skipped if linear systems of ordinary differential equations 
are going to be skipped in Chapter IV. Some of the problems at the end of the 
chapter introduce harmonic functions; harmonic functions will be combined with 
Fourier series in problems in later chapters to motivate and illustrate some of the 
development. 

Chapter IV provides theoretical underpinnings for the material in a traditional 
undergraduate course in ordinary differential equations. Nothing later in the book 
is logically dependent on Chapter IV; however, Chapter XII includes a discussion 
of orthogonal systems of functions, and the examples of these that arise in Chapter 
IV are helpful as motivation. Some people shy away from differential equations 
and might wish to treat Chapter IV only lightly, or perhaps not at all. The subject 
is nevertheless of great importance, and Chapter IV is the beginning of it. A 
minimal treatment of Chapter IV might involve Sections 1—2 and Section 8, all 
of which visibly continue the themes begun in Chapter I. 

Chapters V—VI treat the core of measure theory — including the basic conver- 
gence theorems for integrals, the development of Lebesgue measure in one and 
several variables, Fubini’s Theorem, the metric spaces L! and L? and L™, and 
the use of maximal theorems for getting at differentiation of integrals and other 
theorems concerning almost-everywhere convergence. In Chapter V Lebesgue 
measure in one dimension is introduced right away, so that one immediately has 
the most important example at hand. The fundamental Extension Theorem for 
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getting measures to be defined on o-rings and o -algebras is stated when needed but 
is proved only after the basic convergence theorems for integrals have been proved; 
the proof in Sections 5—6 may be skipped on first reading. Section 7, on Fubini’s 
Theorem, is a powerful result about interchange of integrals. At the same time 
that it justifies interchange, it also constructs a “double integral”; consequently 
the section prepares the way for the construction in Chapter VI of n-dimensional 
Lebesgue measure from 1-dimensional Lebesgue measure. Section 10 introduces 
normed linear spaces along with the examples of L! and L? and L®, and it goes 
on to establish some properties of all normed linear spaces. Chapter VI fleshes 
out measure theory as it applies to Euclidean space in more than one dimension. 
Of special note is the Lebesgue-integration version in Section 5 of the change- 
of-variables formula for multiple integrals and the Riesz—Fischer Theorem in 
Section 7. The latter characterizes square-integrable periodic functions by their 
Fourier coefficients and makes the subject of Fourier series useful in constructing 
functions. Differentiation of integrals in approached in Section 6 of Chapter VI 
as a problem of estimating finiteness of a quantity, rather than its smallness; the 
device is the Hardy—Littlewood Maximal Theorem, and the approach becomes a 
routine way of approaching almost-everywhere convergence theorems. Sections 
8-10 are of somewhat less importance and may be omitted if time is short; 
Section 10 is applied only in Section IX.6. 

Chapters VI-IX are continuations of measure theory that are largely indepen- 
dent of each other. Chapter VII contains the traditional proof of the differentiation 
of integrals on the line via differentiation of monotone functions. No later chapter 
is logically dependent on Chapter VII; the material is included only because of its 
historical importance and its usefulness as motivation for the Radon—Nikodym 
Theorem in Chapter IX. Chapter VII is an introduction to the Fourier transform 
in Euclidean space. Its core consists of the first four sections, and the rest may be 
considered as optional if Section IX.6 is to be omitted. Chapter [IX concerns L? 
spaces for 1 < p < 00; only Section 6 makes use of material from Chapter VIII. 

Chapter X develops, at the latest possible time in the book, the necessary part 
of point-set topology that goes beyond metric spaces. Emphasis is on product 
and quotient spaces, and on Urysohn’s Lemma concerning the construction of 
real-valued functions on normal spaces. 

Chapter XI contains one more continuation of measure theory, namely special 
features of measures on locally compact Hausdorff spaces. It provides an example 
beyond L? spaces in which one can usefully identify the dual of a particular 
normed linear space. These chapters depend on Chapter X and on the first five 
sections of Chapter IX but do not depend on Chapters VI-VIII. 

Chapter XII is a brief introduction to functional analysis, particularly to Hilbert 
spaces, Banach spaces, and linear operators on them. The main topics are the 
geometry of Hilbert space and the three main theorems about Banach spaces. 
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identity matrix or operator 
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spaces of column vectors 
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CHAPTER I 


Theory of Calculus in One Real Variable 


Abstract. This chapter, beginning with Section 2, develops the topic of sequences and series 
of functions, especially of functions of one variable. An important part of the treatment is an 
introduction to the problem of interchange of limits, both theoretically and practically. This problem 
plays a role repeatedly in real analysis, but its visibility decreases as more and more results are 
developed for handling it in various situations. Fourier series are introduced in this chapter and are 
carried along throughout the book as a motivating example for a number of problems in real analysis. 

Section | makes contact with the core of a first undergraduate course in real-variable theory. 
Some material from such a course is repeated here in order to establish notation and a point of view. 
Omitted material is summarized at the end of the section, and some of it is discussed in a little more 
detail in an appendix at the end of the book. The point of view being established is the use of defining 
properties of the real number system to prove the Bolzano—Weierstrass Theorem, followed by the 
use of that theorem to prove some of the difficult theorems that are usually assumed in a one-variable 
calculus course. The treatment makes use of the extended real-number system, in order to allow sup 
and inf to be defined for any nonempty set of reals and to allow lim sup and lim inf to be meaningful 
for any sequence. 

Sections 2-3 introduce the problem of interchange of limits. They show how certain concrete 
problems can be viewed in this way, and they give a way of thinking about all such interchanges in 
a common framework. A positive result affirms such an interchange under suitable hypotheses of 
monotonicity. This is by way of introduction to the topic in Section 3 of uniform convergence and 
the role of uniform convergence in continuity and differentiation. 

Section 4 gives a careful development of the Riemann integral for real-valued functions of one 
variable, establishing existence of Riemann integrals for bounded functions that are discontinuous 
at only finitely many points, basic properties of the integral, the Fundamental Theorem of Calculus 
for continuous integrands, the change-of-variables formula, and other results. Section 5 examines 
complex-valued functions, pointing out the extent to which the results for real-valued functions in 
the first four sections extend to complex-valued functions. 

Section 6 is a short treatment of the version of Taylor’s Theorem in which the remainder is given 
by an integral. Section 7 takes up power series and uses them to define the elementary transcendental 
functions and establish their properties. The power series expansion of (1+.x)? for arbitrary complex 
p is studied carefully. Section 8 introduces Cesaro and Abel summability, which play a role in the 
subject of Fourier series. A converse theorem to Abel’s theorem is used to exhibit the function |x| as 
the uniform limit of polynomials on [—1, 1]. The Weierstrass Approximation Theorem of Section 9 
generalizes this example and establishes that every continuous complex-valued function on a closed 
bounded interval is the uniform limit of polynomials. 

Section 10 introduces Fourier series in one variable in the context of the Riemann integral. The 
main theorems of the section are a convergence result for continuously differentiable functions, 
Bessel’s inequality, the Riemann—Lebesgue Lemma, Fejér’s Theorem, and Parseval’s Theorem. 
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2 I. Theory of Calculus in One Real Variable 
1. Review of Real Numbers, Sequences, Continuity 


This section reviews some material that is normally in an undergraduate course 
in real analysis. The emphasis will be on a rigorous proof of the Bolzano- 
Weierstrass Theorem and its use to prove some of the difficult theorems that are 
usually assumed in a one-variable calculus course. We shall skip over some easier 
aspects of an undergraduate course in real analysis that fit logically at the end of 
this section. A list of such topics appears at the end of the section. 

The system of real numbers R may be constructed out of the system of rational 
numbers Q, and we take this construction as known. The formal definition is that 
a real number is a cut of rational numbers, i.e., a subset of rational numbers that 
is neither Q nor the empty set, has no largest element, and contains all rational 
numbers less than any rational that it contains. The idea of the construction is 
as follows: Each rational number g determines a cut g*, namely the set of all 
rationals less than g. Under the identification of Q with a subset of R, the cut 
defining a real number consists of all rational numbers less than the given real 
number. 

The set of cuts gets a natural ordering, given by inclusion. In place of C, we 
write <. For any two cuts r and s, we haver < s or s <r, and if both occur, 
then r = s. We can then define <, >, and > in the expected way. The positive 
cuts r are those with 0* < r, and the negative cuts are those with r < 0*. 

Once cuts and their ordering are in place, one can go about defining the usual 
operations of arithmetic and proving that R with these operations satisfies the 
familiar associative, commutative, and distributive laws, and that these interact 
with inequalities in the usual ways. The definitions of addition and subtraction 
are easy: the sum or difference of two cuts is simply the set of sums or differences 
of the rationals from the respective cuts. For multiplication and reciprocals one 
has to take signs into account. For example, the product of two positive cuts 
consists of all products of positive rationals from the two cuts, as well as 0 and all 
negative rationals. After these definitions and the proofs of the usual arithmetic 
operations are complete, it is customary to write 0 and 1 in place of 0* and 1*. 

An upper bound for a nonempty subset F of R is a real number M such that 
x < M for all x in E. If the nonempty set E has an upper bound, we can take the 
cuts that E consists of and form their union. This turns out to be a cut, it is an 
upper bound for £, and it is < all upper bounds for E. We can summarize this 
result as a theorem. 


Theorem 1.1. Any nonempty subset F of R with an upper bound has a least 
upper bound. 


The least upper bound is necessarily unique, and the notation for it is sup, <- x 
or sup {x | x € E}, “sup” being an abbreviation for the Latin word “supremum,” 
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the largest. Of course, the least upper bound for a set E with an upper bound 
need not be in E; for example, the supremum of the negative rationals is 0, which 
is not negative. 

A lower bound for a nonempty set F of R is a real number m such that x > m 
for all x € E. Ifm isa lower bound for £, then —m is an upper bound for the set 
—E of negatives of members of E. Thus —£ has an upper bound, and Theorem 
1.1 shows that it has a least upper bound sup,.__;, x. Then —x is a greatest lower 
bound for E. This greatest lower bound is denoted by infy<g y or inf {y | y € E}, 
“inf” being an abbreviation for “infimum.” We can summarize as follows. 


Corollary 1.2. Any nonempty subset E of R with a lower bound has a greatest 
lower bound. 


A subset of R is said to be bounded if it has an upper bound and a lower bound. 
Let us introduce notation and terminology for intervals of R, first treating the 
bounded ones.! Let a and b be real numbers with a < b. The open interval 
from a to b is the set (a,b) = {x € R| a < x < }}, the closed interval is 
the set [a,b] = {x € R| a < x < b}, and the half-open intervals are the sets 
[a,b) ={x €R|a<x < b}and(a,b] = {x €R|a <x < b}. Each of the 
above intervals is indeed bounded, having a as a lower bound and b as an upper 
bound. These intervals are nonempty when a < b or when the interval is [a, b] 
with a = b, and in these cases the least upper bound is b and the greatest lower 
bound is a. 

Open sets in R are defined to be arbitrary unions of open bounded intervals, 
and a closed set is any set whose complement in R is open. A set F is open if and 
only if for each x € E, there is an open interval (a, b) such that x € (a,b) C E. 
In this case we of course have a < x < b. If we put € = min{x — a,b — x}, 
then we see that x lies in the subset (x — €, x + €) of (a, b). The open interval 
(x — €,x + €) equals {y eR | ly—x| < e}. Thus an open set in R is any set E 
such that for each x € FE, there is anumber e€ > 0 such that {y eR | ly—x| < €} 
lies in E. A limit point x of a subset F of R is a point of R such that any 
open interval containing x meets F in a point other than x. For example, the set 
[a, b) U{b+ 1} has [a, b] as its set of limit points. A subset of R is closed if and 
only if it contains all its limit points. 

Now let us turn to unbounded intervals. To provide notation for these, we shall 
make use of two symbols +00 and — oo that will shortly be defined to be “extended 
real numbers.” If a is in R, then the subsets (a,+oo) = {x € R | a < x}, 
(—oo,a) = {x € R| x < a}, (-co, +00) = R, [a, +00) = {x € R| a < x}, 
and (—oo, a] = {x € R | x < a} are defined to be intervals, and they are all 
unbounded. The first three are open sets of R and are considered to be open 


‘Bounded intervals are called “finite intervals” by some authors. 
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intervals, while the last three are closed sets and are considered to be closed 
intervals. Specifically the middle set R is both open and closed. 

One important consequence of Theorem 1.1 is the archimedean property of 
R, as follows. 


Corollary 1.3. If a and b are real numbers with a > 0, then there exists an 
integer n with na > b. 


PROOF. If, on the contrary, na < b for all integers n, then b is an upper bound 
for the set of all na. Let M be the least upper bound of the set {na | n is an integer}. 
Using that a is positive, we find that a~' M is a least upper bound for the integers. 
Thus n < a~'M forall integers n, and there is no smaller upper bound. However, 
the smaller number a~'M — 1 must be an upper bound, since saying n < a~'M 
for all integers is the same as saying n — 1 < a~'!M —1 forall integers. We arrive 
at a contradiction, and we conclude that there is some integer n with na > b. 


The archimedean property enables one to see, for example, that any two 
distinct real numbers have a rational number lying between them. We prove 
this consequence as Corollary 1.5 after isolating one step as Corollary 1.4. 


Corollary 1.4. If c is a real number, then there exists an integer n such that 
n<c<n+t+l. 


PROOF. Corollary 1.3 with a = 1 and b = c shows that there is an integer M 
with M > c, and Corollary 1.3 with a = 1 and b = —c shows that there is an 
integer m with m > —c. Then —m < c < M, and it follows that there exists a 
greatest integer n with n < c. This n must have the property that c < n+ 1, and 
the corollary follows. 


Corollary 1.5. If x and y are real numbers with x < y, then there exists a 
rational number r with x <r <y. 


PRooF. By Corollary 1.3 with a = y — x and b = 1, there is an integer N 
such that N(y — x) > 1. This integer N has to be positive. Then x <y-x. 
By Corollary 1.4 with c = Nx, there exists an integer n withn < Nx <n+1, 
hence with 7 <x < ae Adding the inequalities 7 < x and x < y—x yields 


n+l n n+l : n 2n+1 n+l : 

aa - y. Thus x < w< Ww <y-: Since HN < Sy << WO , the rational number 
— 2nt : : 

r = =z has the required properties. 


A sequence in a set S is a function from a certain kind of subset of integers into 
S. It will be assumed that the set of integers is nonempty, consists of consecutive 
integers, and contains no largest integer. In particular the domain of any sequence 
is infinite. Usually the set of integers is either all nonnegative integers or all 
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positive integers. Sometimes the set of integers is all integers, and the sequence 
in this case is often called “doubly infinite.’ The value of a sequence f at the 
integer n is normally written f,, rather than f(n), and the sequence itself may be 
denoted by an expression like { f,},>1, in which the outer subscript indicates the 
domain. 

A subsequence of a sequence f with domain {m,m-+1, ...}is acomposition 
f on, where f is a sequence and n is a sequence in the domain of f such that 
Ng < Ny, for all k. For example, if {a,},>1 1s a sequence, then {a2,},>1 is the 
subsequence in which the function n is given by nz = 2k. The domain of a 
subsequence, by our definition, is always infinite. 

A sequence a, in R is convergent, or convergent in R, if there exists a real 
number a such that for each € > 0, there is an integer N with |a, —a| < € 
for alln > N. The number a is necessarily unique and is called the limit 
of the sequence. Depending on how much information about the sequence is 
unambiguous, we may write limy_.o0 @) = a or lim, a, = a or lima, = a or 
an — a. We also say a, tends to a as n tends to infinity or oo. 

A sequence in R is called monotone increasing if a, < a,+, for all n in the 
domain, monotone decreasing if a, > a,+, for all n in the domain, monotone 
if it is monotone increasing or monotone decreasing. 


Corollary 1.6. Any bounded monotone sequence in R converges. If the 
sequence is monotone increasing, then the limit is the least upper bound of the 
image in R of the sequence. If the sequence is monotone decreasing, the limit is 
the greatest lower bound of the image. 


REMARK. Often it is Corollary 1.6, rather than the existence of least upper 
bounds, that is taken for granted in an elementary calculus course. The reason 
is that the statement of Corollary 1.6 tends for calculus students to be easier to 
understand than the statement of the least upper bound property. Problem 1 at the 
end of the chapter asks for a derivation of the least-upper-bound property from 
Corollary 1.6. 


PROOF. Suppose that {a,} is monotone increasing and bounded. Let a = 
sup,, dn, the existence of the supremum being ensured by Theorem 1.1, and let 
€ > 0 be given. If there were no integer N withay > a —€,then a —€ would be 
a smaller upper bound, contradiction. Thus such an N exists. For that N,n > N 
implies a— € < dy <a, <a<a-+e. Thusn > N implies la, —a| < €. 
Since € is arbitrary, lim,—..9d, = a. If the given sequence {a,} is monotone 
decreasing, we argue similarly with a = inf, dy. 


In working with sup and inf, it will be quite convenient to use the notation 
SUP,.<g x even when E is nonempty but not bounded above, and to use the notation 
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inf,<g x even when F is nonempty but not bounded below. We introduce symbols 
+oo and —oo, plus and minus infinity, for this purpose and extend the definitions 
of sup,<- x and inf,<z x to all nonempty subsets E of R by taking 


sup x = +00 if E has no upper bound, 
xeE 


inf x = —oo if E has no lower bound. 
xeEE 


To work effectively with these new pieces of notation, we shall enlarge R to a 
set R* called the extended real numbers by defining 


R* = RU {+00} U {-ox}. 


An ordering on R* is defined by taking —oo <r < +00 forevery memberr of R 
and by retaining the usual ordering within R. It is immediate from this definition 
that 

inf x < supx 

xeE xeE 
if E is any nonempty subset of R. In fact, we can enlarge the definitions of inf,<¢ x 
and sup,<,* in obvious fashion to include the case that E is any nonempty 
subset of IR*, and we still have inf < sup. With the ordering in place, we can 
unambiguously speak of open intervals (a, b), closed intervals [a, b], and half- 
open intervals [a, b) and (a, b] in R* even if a or b is infinite. Under our 
definitions the intervals of R are the intervals of IR* that are subsets of R, even if 
a or b is infinite. If no special mention is made whether an interval lies in R or 
R*, it is usually assumed to lie in R. 

The next step is to extend the operations of arithmetic to R*. It is important 
not to try to make such operations be everywhere defined, lest the distributive 
laws fail. Letting r denote any member of R and a and b be any members of R*, 
we make the following new definitions: 


+oo ifr >0, 

Multiplication: r(+o0) = (400)r = 7 0 ifr =0, 
—oo ifr <0, 

—oo ifr>0, 

r(—oo) = (—oo)r = { 0 ifr =0, 


+oo ifr <0, 
(+00)(+00) = (—00)(—00) = +00, 
(+00)(—0o) = (—00)(+00) = —0o. 
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Addition: r+(4oo) = (40) +r=+0, 
r + (—00) = (-00) +r = —00, 
(+00) + (+00) = +00, 
(—0o) + (—00) = —00. 
Subtraction: a—b=a+(—b) whenever the right side is defined. 
Division: a/b =0 ifa € Rand bis +~, 


a/b=b"'a ifb € Rwithb ¢ Oanda is +00. 


The only surprise in the list is that 0 times anything is 0. This definition will be 
important to us when we get to measure theory, starting in Chapter V. 

It is now a simple matter to define convergence of a sequence in R*. The cases 
that need addressing are that the sequence is in R and that the limit is -+-oo or —oo. 
We say that a sequence {a,,} in R tends to +-oo if for any positive number M, there 
exists an integer N such that a, > M for alln > N. The sequence tends to —co 
if for any negative number — M, there exists an integer N such that a, < —M 
for alln > N. It is important to indicate whether convergence/divergence of a 
sequence is being discussed in R orin R*. The default setting is R, in keeping with 
standard terminology in calculus. Thus, for example, we say that the sequence 
{n}n>1 diverges, but it converges in R* (to +00). 

With our new definitions every monotone sequence converges in R*. 

For a sequence {a,} in R or even in R*, we now introduce members lim sup, a, 
and lim inf, a, of R*. These will always be defined, and thus we can apply the 
operations lim sup and liminf to any sequence in R*. For the case of lim sup 
we define b, = sup;.,, a, aS a sequence in R*. The sequence {b,} is monotone 
decreasing. Thus it converges to inf, b, in R*. We define? 


lim sup a, = inf sup a, 
n n k>n 


as a member of R*, and we define 


lim inf a, = sup inf a, 
n n k>n 


as a member of IR*. Let us underscore that lim sup a, and lim inf a, always exist. 
However, one or both may be oo even if a, is in R for every n. 


Proposition 1.7. The operations lim sup and lim inf on sequences {a,} and 
{b,} in R* have the following properties: 


(a) if a, < b, for all n, then limsupa, < limsupb, and liminfa, < 
lim inf b,, 


The notation lim was at one time used for lim sup, and lim was used for lim inf. 
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(b) liminfa, < limsupa,, 

(c) {a,} has a subsequence converging in R* to limsupa, and another sub- 
sequence converging in R* to liminfa,, 

(d) lim sup a, is the supremum of all subsequential limits of {a,,} in R*, and 
lim inf, is the infimum of all subsequential limits of {a,} in R*, 

(e) if limsupa, < +00, then lim sup ay, is the infimum of all extended real 
numbers a such that a, > a for only finitely many n, and if liminfa, > 
—oo, then lim inf a, is the supremum of all extended real numbers a such 
that a, < a for only finitely many n, 

(f) the sequence {a,} in R* converges in R* if and only if liminfa, = 
lim sup a, and in this case the limit is the common value of lim infa,, and 
lim sup ay. 


REMARK. It is enough to prove the results about lim sup, since lim infa, = 
— lim sup(—a,). 


PROOFS FOR lim sup. 

(a) From a; < b; for all /, we have a; < sup,.,, by if] > n. Hence sup,.,, a; < 
sup; +, D,. Then (a) follows by taking the limit on n. 7 

(b) This follows by taking the limit onn of the inequality infy.,, a, < sup.) Gk- 

(c) We divide matters into cases. The main case is that a = lim sup dy, is in R. 
Inductively, for each / > 1, choose N > n;_; such that | sup; y ax — a] < Pes 
Then choose nj > nj—; such that |ap, — sup, y ax| < [2 Together these 
inequalities imply |a,, — a| < 21~! for all 7, and thus limy_. 5 An, = a. The 
second case is that a = limsupa, equals +00. Since sup;.,, a, iS monotone 
decreasing in n, we must have sup,.,, ag = +00 for all n. Inductively for] > 1, 
we can choose n; > mj_ such that An, = Ll. Then limj.o0 an, = +00. The 
third case is that a = limsupa, equals —oo. The sequence b, = sup;.,, ax iS 
monotone decreasing to —oo. Inductively for / > 1, choose n; > ni-1 such that 
by, < —l. Then ay, < by, < —l, and limj_,o9 dn, = —O0. 

(d) By (c), lim supa, is one subsequential limit. Let a = limy—o0 dp, be an- 
other subsequential limit. Put b, = sup,,,, a7. Then {b,} converges to lim sup a, 
in R*, and the same thing is true of every subsequence. Since ay, < SUP;sp, 4 = 
by, for all k, we can let k tend to infinity and obtain a = limpodn, < 
limy-o0 Dn, = lim sup ay. 

(e) Since lim supa, < +00, we have sup,.,, ax < +00 for n greater than or 
equal to some NV. For this N and any a > sup; > n 4k, we then have a, > a only 
finitely often. Thus there exists a € R such that a, > a for only finitely many n. 
On the other hand, if a’ is areal number < lim sup a,, then (c) shows that a, > a’ 
for infinitely many n. Hence 


lim supa, < inf {a | a, > a for only finitely many a}. 
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Arguing by contradiction, suppose that < holds in this inequality, and let a” be a 
real number strictly in between the two sides of the inequality. Then sup,.,, ax < 
a” for n large enough, and so a, > a” only finitely often. But then a” is in the set 


{a | dy; => a for only finitely many a}, 


and the statement that a” is less than the infimum of this set gives a contradiction. 

(f) If {a,} converges in IR*, then (c) forces lim inf a, = lim sup a,. Conversely 
suppose lim inf a, = lim sup a,,, and let a be the common value of lim infa, and 
lim sup a,. The main case is that a isin R. Lete > 0 be given. By (e),a, > a+e 
only finitely often, and a, < a — € only finitely often. Thus |a, — a| < € for 
all n sufficiently large. In other words, lima, = a as asserted. The other cases 
are that a = +00 or a = —ov, and they are completely analogous to each other. 
Suppose for definiteness that a = +oo. Since liminfa, = +00, the monotone 
increasing sequence b, = infy>, a, converges in IR* to +oo. Given M, choose 
N such that b, > M forn > N. Then alsoa, > M forn > N, and a, converges 
in R* to +00. This completes the proof. 


With Proposition 1.7 as a tool, we can now prove the Bolzano—Weierstrass The- 
orem. The remainder of the section will consist of applications of this theorem, 
showing that Cauchy sequences in R converge in R, that continuous functions 
on closed bounded intervals of IR are uniformly continuous, that continuous 
functions on closed bounded intervals are bounded and assume their maximum 
and minimum values, and that continuous functions on closed intervals take on 
all intermediate values. 


Theorem 1.8 (Bolzano—Weierstrass). Every bounded sequence in R has a 
convergent subsequence with limit in R. 


PROOF. If the given bounded sequence is {a,,}, form the subsequence noted in 
Proposition 1.7c that converges in R* to limsupa,. All quantities arising in the 
formation of lim sup a, are in R, since {a,,} is bounded, and thus the limit is in R. 


A sequence {a,} in R is called a Cauchy sequence if for any € > 0, there 
exists an N such that |a, — a,,| < € for all nm and m that are > N. 


EXAMPLE. Every convergent sequence in R with limit in R is Cauchy. In fact, 
let a = limay,, and let € > O be given. Choose N such that n > N implies 
lan —a| < €. Thenn,m > N implies 


ldn — Am| < |€n — a| + |a — ay| < € +€ = 2e. 


Hence the sequence is Cauchy. 
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In the above example and elsewhere in this book, we allow ourselves the luxury 
of having our final bound come out as a fixed multiple Me of €, rather than € 
itself. Strictly speaking, we should have introduced «’ = €/M and aimed for 
é’ rather than €. Then our final bound would have been Me’ = e€. Since the 
technique for adjusting a proof in this way is always the same, we shall not add 
these extra steps in the future unless there would otherwise be a possibility of 
confusion. 

This convention suggests a handy piece of terminology —that a proof as in the 
above example, in which M = 2, is a“2e proof.” That name conveys a great deal 
of information about the proof, saying that one should expect two contributions 
to the final estimate and that the final bound will be 2e. 


Theorem 1.9 (Cauchy criterion). Every Cauchy sequence in R converges to a 
limit in R. 

PROOF. Let {a,} be Cauchy in R. First let us see that {a,} is bounded. In 
fact, for € = 1, choose N such that n,m > N implies |ay — dm| < 1. Then 
ladm| < |an| + 1 form > N, and M = max{{aj|,..., |any—1], |aw| + 1} is a 
common bound for all |a,|. 

Since {a,} is bounded, it has a convergent subsequence {a,,}, say with limit 
a, by the Bolzano—Weierstrass Theorem. The subsequential limit has to satisfy 
|a| < M within R*, and thus a is in R. 

Finally let us see that lima, = a. In fact, if € > 0 is given, choose N such 
that ny > N implies |an, — a| < €. Also, choose N’ > N such that n,m > N’ 
implies |an — dm| < €. Ifn > N’, then any ng > N’ has |dn — an,| < €, and 
hence 

[an — A| < |An — An, | + |Qn, —a| < € +€ = 2e. 


This completes the proof. 


Let f be a function with domain an interval and with range in R. The interval 
is allowed to be unbounded, but it is required to be a subset of R. We say 
that f is continuous at a point xo of the domain of f within R if for each 
€ > 0, there is some 5 > O such that all x in the domain of f that satisfy 
|x — xo| < 6 have | f(x) — f(xo)| < €. This notion is sometimes abbreviated as 
limyx, f{(@) = f (xo). Alternatively, one may say that f(x) tends to f (xo) as 
x tends to x9, and one may write f(x) > f(xo) asx > Xo. 

A mathematically equivalent definition is that f is continuous at xo if whenever 
a sequence has x, — Xo within the domain interval, then f(x,) > f (xo). This 
latter version of continuity will be shown in Section II.4 to be equivalent to the 
former version, given in terms of continuous limits, in greater generality than just 
for IR, and thus we shall not stop to prove the equivalence now. We say that f is 
continuous if it is continuous at all points of its domain. 
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We say that the a function f as above is uniformly continuous on its domain 
if for any € > 0, there is some 6 > 0 such that | f(x) — f(xo)| < € whenever x 
and xo are in the domain interval and |x — xo| < 6. (In other words, the condition 
for the continuity to be uniform is that 6 can always be chosen independently of 
Xo.) 


EXAMPLE. The function f(x) = x? is continuous on (—oo, +00), but it is 
not uniformly continuous. In fact, it is not uniformly continuous on [1, +00). 
Assuming the contrary, choose 6 fore = 1. Then we must have |(x+-$)?—x?| <l 


for all x > 1. But |(x + $)? — x?| =5x4 5 > 6x,and this is > 1 forx > 57!. 


Theorem 1.10. A continuous function f from a closed bounded interval [a, b] 
into R is uniformly continuous. 


PROOF. Fix € > 0. For xo in the domain of f, the continuity of f at x9 means 
that it makes sense to define 


dx) (€) = min {I sup {° >0 


|x — xo| < 6’ and x in the domain || 
of f imply | f(x) — f@o)| < € 
If |x — xo| < 6x, (€), then | f(x) — f(xo)| < €. Put d(€) = infycfa,p] bx, (€)- 
Let us see that it is enough to prove that 6(€) > 0. If x and y are in [a, b] with 
|x — y| < d(e), then |x — y| < d(e) < dy(e). Hence | f(x) — f(y)| < € as 
required. 

Thus we are to prove that d(€) > 0. If d5(€) = 0, then, for each integer 
n > 0, we can choose x, such that 6,,(€) < 7 By the Bolzano—Weierstrass 
Theorem, there is a convergent subsequence, say with x,, — x’. Along this 
subsequence, bx, (€) — 0. Fix k large enough so that |x, —x’| < 55x"(§). Then 


If nm) — fx) < §. Also, |x — xn,| < 34x(§) implies 
|x a x | < |x — Xn, | + Xn, _ x" | < 58x/(§) + 55x'(§) — bv(5), 
so that | f(x) — f(x’)| < 5 and 
FG =fOre ie =fOMfo) [Ole ss Ss 


Consequently our arbitrary large fixed k has 4,, > 55x(§), and the sequence 


Xn, — 2 
{5, (€)} cannot be tending to 0. . 


Xng 


Theorem 1.11. A continuous function f from a closed bounded interval [a, b] 
into R is bounded and takes on maximum and minimum values. 


PROOF. Let c = supyerq4; f(x) in R*. Choose a sequence x, in [a, 5] 
with f(x) increasing to c. By the Bolzano—Weierstrass Theorem, {x,} has a 
convergent subsequence, say x,, — x’. By continuity, f(xn,) > f(’). Then 
f(x’) =c, and c is a finite maximum. The proof for a finite minimum is similar. 
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Theorem 1.12 (Intermediate Value Theorem). Let a < b be real numbers, 
and let f : [a, b] — R be continuous. Then f, in the interval [a, b], takes on all 
values between f(a) and f(b). 


REMARK. The proof below, which uses the Bolzano—Weierstrass Theorem, 
does not make absolutely clear what aspects of the structure of R are essential to 
the argument. A conceptually clearer proof will be given in Section II.8 and will 
bring out that the essential property of the interval [a, b] is its “connectedness” 
in a sense to be defined in that section. 


PROOF. Let f(a) = a and f(b) = B, and let y be between a and 6. We may 
assume that y is in fact strictly between a and f. Possibly by replacing f by 
—f, we may assume that alsoa < B. Let 


A={xela,b]|f@)<sy} and B={xe[a,b]| f)2 y}. 


These sets are nonempty, since a is in A and b is in B, and f is bounded as 
a result of Theorem 1.11. Thus the numbers y; = sup{f(x) | x € A} and 
y2 = inf { f(x) | x € B} are well defined and have y; < y < +. 

If y; = y, then we can find a sequence {x,,} in A such that f (x,) converges to y. 
Using the Bolzano—Weierstrass Theorem, we can find a convergent subsequence 
{Xn,} of {x,}, say with limit x9. By continuity of f, {f(%,,)} converges to f (xo). 
Then f(x) = ¥%1 = y, and we are done. Arguing by contradiction, we may 
therefore assume that y; < y. Similarly we may assume that y < y2, but we do 
not need to do so. 

Let € = 2 — 71, and choose, by Theorem 1.10 and uniform continuity, 5 > 0 
such that |x; — x2| < 6 implies | f (x1) — f(x2)| < € whenever x; and x2 both 
lie in [a, b]. Then choose an integer n such that 2~"(b — a) < 6, and consider 
the value of f at the points pp = a+k2-"(b — a) forO < k <2”. Since 
Prat — Pk = 2-"(b —a) < 4, we have | f (peri) — fp) < € =v -N- 
Consequently if f (px) < 1, then 


f (pen) < fp) +1f Pea) — fPOl<n+”n-ywW=%, 


and hence f (presi) < v1. Now f (po) = f(a) =a < y,. Thus induction shows 
that f(p,) < ~ for all k < 2~”. However, for k = 2”, we have p2-» = b, and 
f(b) = B => y >, and we have arrived at a contradiction. 


Further topics. Here a number of other topics of an undergraduate course in real-variable theory 
fit well logically. Among these are countable vs. uncountable sets, infinite series and tests for their 
convergence, the fact that every rearrangement of an infinite series of positive terms has the same 
sum, special sequences, derivatives, the Mean Value Theorem as in Section A2 of the appendix, 
and continuity and differentiability of inverse functions as in Section A3 of the appendix. We shall 
not stop here to review these topics, which are treated in many books. One such book is Rudin’s 
Principles of Mathematical Analysis, the relevant chapters being | to 5. In Chapter 2 of that book, 
only the first few pages are needed; they are the ones where countable and uncountable sets are 
discussed. 
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2. Interchange of Limits 


Let {b;;} be a doubly indexed sequence of real numbers. It is natural to ask for 
the extent to which 
ij joi 


more specifically to ask how to tell, in an expression involving iterated limits, 
whether we can interchange the order of the two limit operations. We can view 
matters conveniently in terms of an infinite matrix 


by, by. 
bo, ba 


The left-hand iterated limit, namely lim, lim, b;;,is obtained by forming the limit 
of each row, assembling the results, and then taking the limit of the row limits 
down through the rows. The right-hand iterated limit, namely lim; lim; b;;, is 
obtained by forming the limit of each column, assembling the results, and then 
taking the limit of the column limits through the columns. If we use the particular 
infinite matrix 


1 
1 
1 
1 


_oOoCoorF 
oor eS 
ore 


then we see that the first iterated limit depends only on the part of the matrix above 
the main diagonal, while the second iterated limit depends only on the part of the 
matrix below the main diagonal. Thus the two iterated limits in general have no 
reason at all to be related. In the specific matrix that we have just considered, 
they are 1 and 0, respectively. Let us consider some examples along the same 
lines but with an analytic flavor. 


EXAMPLES. 


(2) Let F,, be a continuous real-valued function on R, and suppose that F(x) = 
lim F;, (x) exists for every x. Is F continuous? This is the same kind of question. 


It asks whether lim,_, , F(t) oF (x), hence whether 


lim lim F,(t) = lim lim F,(2). 
t—>x now no [t-x 
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x2 


If we take fy (x) = aaa5F for k > 0 and define F,,(x) = )-f_o f(x), then 
x 
each F, is continuous. The sequence of functions {F,,} has a pointwise limit 


F _ oo a Th phen : : d Al 
(x) = ow GsadF e series is a geometric series, and we can easily 
calculate explicitly the partial sums and the limit function. The latter is 
0 ifx =0 
F(x) = ra 
l+x- ifx 40. 
It is apparent that the limit function is discontinuous. 


(3) Let { f,,} be a sequence of differentiable functions, and suppose that f(x) = 
lim f,(x) exists for every x and is differentiable. Is lim f/(x) = f’(x)? This 
question comes down to whether 

_— fn(t) — fn(x) 
m ————. 


fim tim 207 FC) 2 in i 


noo t>x t—x tx noo t—x 


An example where the answer is negative uses the sine and cosine functions, 


which are undefined in the rigorous development until Section 7 on power series. 


The example has f,(x) = mes 


f(x) = Oand f’(x) = 0. Also, f/(x) = ./n cosnx, so that f’(0) = /n does 
not tend to0 = f’(0). 


forn > 1. Then lim, f,(x) = 0, so that 


Yet we know many examples from calculus where an interchange of limits is 
valid. For example, in calculus of two variables, the first partial derivatives of 
nice functions — polynomials, for example —can be computed in either order with 
the same result, and double integrals of continuous functions over a rectangle can 
be calculated as iterated integrals in either order with the same result. Positive 
theorems about interchanging limits are usually based on some kind of uniform 
behavior, in a sense that we take up in the next section. A number of positive 
results of this kind ultimately come down to the following general theorem about 
doubly indexed sequences that are monotone increasing in each variable. In 
Section 3 we shall examine the mechanism of this theorem closely: the proof 
shows that the equality in question is sup; sup; ;; = sup, sup; b;; and that it 
holds because both sides equal sup, ; bi;. 


Theorem 1.13. Let b;; be members of R* that are > 0 for alli and j. Suppose 
that b;; is monotone increasing ini, for each j, and is monotone increasing in j, 
for each i. Then 

ij ji 


with all the indicated limits existing in R*. 
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PROOF. Put L; = lim; bj; and Li‘ = lim; b;;. These limits exist in R*, since 
the sequences in question are monotone. Then L; < L,4; and L‘ < Li’ 44> and 
thus , 


Lahm; and = LL’ = lim; 
i pie 


both exist in R*. Arguing by contradiction, suppose that L < L’. Then we can 
choose jo such that Li. > L. Since Lt = lim; b;;,, we can choose ig such that 
bi, jy > L. Then we have L < bij, < Li, < L, contradiction. Similarly the 
assumption L’ < L leads to a contradiction. We conclude that L = L’. 


Corollary 1.14. If a; are members of R* that are > 0 and are monotone 
increasing in j for each /, then 


lim 5 aj = ) lim ay; 
J T ‘ T # ; 


in R*, the limits existing. 


REMARK. This result will be generalized by the Monotone Convergence 
Theorem when we study abstract measure theory in Chapter V. 


PROOF. Put b;; = 3~)_; aij in Theorem 1.13. 


Corollary 1.15. If c;; are members of R* that are > 0 for alli and j, then 
hahha 
ij joi 
in R*, the limits existing. 


REMARK. This result will be generalized by Fubini’s Theorem when we study 
abstract measure theory in Chapter V. 


PROOF. This follows from Corollary 1.14. 


3. Uniform Convergence 


Let us examine more closely what is happening in the proof of Theorem 1.13, in 
which it is proved that iterated limits can be interchanged under certain hypotheses 
of monotonicity. One of the iterated limits is L = lim; lim; b;;, and the claim is 
that L is approached asi and j tend to infinity jointly. In terms of a matrix whose 
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entries are the various b;;’s, the pictorial assertion is that all the terms far down 
and to the right are close to L: 


All terms here 
are close to L 


To see this claim, let us choose a row limit L;, that is close to L and then take an 
entry bj, j, that is close to L;,. Then bj, j, is close to L, and all terms down and to 
the right from there are even closer because of the hypothesis of monotonicity. 

To relate this behavior to something uniform, suppose that L < +00, and let 
some € > 0 be given. We have just seen that we can arrange to have |L — b;;| < € 
whenever i > ig and j > jo. Then |L; — b;;| < € whenever i > ig, provided 
J = jo. Also, we have lim; bj; = L; fori = 1,2,...,i9 —1. Thus |L; —b;;| < € 
for all i, provided j > jj, where jj is some larger index than jp. This is the 
notion of uniform convergence that we shall define precisely in a moment: an 
expression with a parameter (j in our case) has a limit (on the variable i in our 
case) with an estimate independent of the parameter. We can visualize matters as 
in the following matrix: 


i 
All terms here 
i nee are close to L; 
on all rows. 


The vertical dividing line occurs when the column index j is equal to jj, and all 
terms to the right of this line are close to their respective row limits L;. 
Let us see the effect of this situation on the problem of interchange of limits. 


The above diagram forces all the terms in the shaded part of to 
MHiT 


be close to one number if lim L; exists, i.e., if the row limits are tending to a 
limit. If the other iterated limit exists, then it must be this same number. Thus 
the interchange of limits is valid under these circumstances. 

Actually, we can get by with less. If, in the displayed diagram above, we 
assume that all the column limits Li‘ exist, then it appears that all the column 
limits with j > jj have to be close to the L;’s. From this we can deduce that the 
column limits have a limit L’ and that the row limits L; must tend to the limit 
of the column limits. In other words, the convergence of the rows in a suitable 
uniform fashion and the convergence of the columns together imply that both 
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iterated limits exist and they are equal. We shall state this result rigorously as 
Proposition 1.16, which will become a prototype for applications later in this 
section. 

Let S be a nonempty set, and let f and f,, for integers n > 1, be functions 
from S to R. We say that f,(x) converges to f(x) uniformly for x in S if for 
any € > 0, there is an integer N such that n > N implies | f,(x) — f (x)| < € for 
all x in S. It is equivalent to say that sup,-- | fn(x) — f(x)| tends to 0 as n tends 
to infinity. 


Proposition 1.16. Let b;; be real numbers fori > 1 and j > 1. Suppose that 

(i) L; = lim; 5;; exists in R uniformly ini, and 

(ii) L’ = lim; b;; exists in R for each j. 

Then 

(a) L = lim; L; exists in R, 

(b) L’ = lim; L‘ exists in R, 

(cc) L=L’, 

(d) the double limit oni and j of b;; exists and equals the common value of 
the iterated limits L and L’,i.e., for each ¢ > 0, there exist ig and jo such 
that |b;; — L| < € whenever i > ig and j > jo, 

(e) Li = lim; bj; exists in R uniformly in j. 


REMARK. In applications we shall sometimes have additional information, 
typically the validity of (a) or (b). According to the statement of the proposition, 
however, the conclusions are valid without taking this extra information as an 
additional hypothesis. 


PROOF. Let € > 0 be given. By (i), choose jo such that |b;; — L;| < € for all 
i whenever j > jo. With j > jo fixed, (ii) says that |b;; — L'| < € whenever i is 
> some ip = io(j). For j > jo andi > ig(j), we then have 


|L; — Li < |L; — bij| + [bij a Li <et+e = 2e. 


If j’ > jo andi > io(j’), we similarly have |L; — L’,| < 2e. Hence if j > jo, 
J’ = jo, andi > max{io(j), io(j)}, then 


[Li — Li] < |Li — Lil +|Li — Li,| < 26 +2€ = 4e. 


In other words, {L’} is a Cauchy sequence. By Theorem 1.9, L’ = lim; L’ exists 
in R. This proves (b). 

Passing to the limit in our inequality, we have |L’ — L'| < 4€ when j > jo 
and in particular when j = jo. If i > io(jo), then we saw that |b;;, — Li| < € 
and |bij. — L' | < €. Hence i > io(jo) implies 


|L; = L' < |L; bijo| t Dig L t |L' L' <e+eE + 4e = 6€. 


‘a 
Jo! TV jo 
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Since ¢€ is arbitrary, L = lim; L; exists and equals L’. This proves (a) and (c). 
Since lim; L; = L, choose i; such that |L; — L| < € wheneveri > i,. Ifi > i, 
and j > jo, we then have 


|bij — L| < lbij — Li] +|Li — LI <e+e=2e. 


This proves (d). 

Leti, and jo be as in the previous paragraph. We have seen that |L’ — L’,| < 4e 
for j > jo. By (b), |L’ —L'| < 4€ whenever j > jo. Hence (c) and the inequality 
of the previous paragraph give 


|bij — Li| < |b —L|+|L—L'| +|L' — L'| < 2¢ +0 + 4e = 6¢ 


whenever i > i, and j > jo. By (b), choose j; > jo such that |bj; — L'| < 6€ 
whenever € {1,...,i;—1} and j > j;. Then j > j, implies |b;; — L'| < 6€ 
for all i whenever j > j;. This proves (e). , 


In checking for uniform convergence, we often do not have access to explicit 
expressions for limiting values. One device for dealing with the problem is a 
uniform version of the Cauchy criterion. Let S be a nonempty set, and let { fn }n>1 
be asequence of functions from S$ to R. We say that { f, (x)} is uniformly Cauchy 
for x € S if for any € > 0, there is an integer N such thatn > N andm > N 
together imply | f,(«) — fin(x)| < € for all x in S. 


Proposition 1.17 (uniform Cauchy criterion). A sequence {f,} of functions 
from a nonempty set S$ to R is uniformly Cauchy if and only if it is uniformly 
convergent. 


ProoF. If {/,} is uniformly convergent to f, we use a 2€ argument, just as 
in the example before Theorem 1.9: Given € > 0, choose N such thatn > N 
implies | f,(x) — f(x)| < €. Thenn > N and m > N together imply 


lfn(®) — fn) Sl fn) — FO) + IFO) — fin)| < € +€ = 2e. 


Thus { f,,} is uniformly Cauchy. 

Conversely suppose that { f,,} is uniformly Cauchy. Then { f,,(x)} is Cauchy for 
each x. Theorem 1.9 therefore shows that there exists a function f : S > Rsuch 
that lim, f,(~) = f(x) for each x. We prove that the convergence is uniform. 
Given € > 0, choose N, as is possible since { f,,} is uniformly Cauchy, such that 
n> N andm => N together imply | f(x) — fin(x)| < €. Letting m tend to co 
shows that | fn(x) — f(x)| < € forn > N. Hence lim, fn(x) = f (x) uniformly 
for x in S. 
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In practice, uniform convergence often arises with infinite series of functions, 
and then the definition and results about uniform convergence are to be applied to 
the sequence of partial sums. If the series is °° ; ax (x), one wants | Fy OED) | 
to be small for all m and n sufficiently large. Some of the standard tests for 
convergence of series of numbers yield tests for uniform convergence of series of 
functions just by introducing a parameter and ensuring that the estimates do not 
depend on the parameter. We give two clear-cut examples. One is the uniform 
alternating series test or Leibniz test, given in Corollary 1.18. A generalization 
is the handy test given in Corollary 1.19. 


Corollary 1.18. If for each x in a nonempty set S, {a,(x)}n>1 iS a mono- 
tone decreasing sequence of nonnegative real numbers such that lim, a,(x) = 0 
uniformly in x, then pine (—1)”"a,(x) converges uniformly. 


PROOF. The hypotheses are such that | ae (—1)ka, (x)| < sup, |an(x)| 
whenever n > m, and the uniform convergence is immediate from the uniform 
Cauchy criterion. 


Corollary 1.19. If for each x in a nonempty set S, {a,(x)}n>1 is a Monotone 
decreasing sequence of nonnegative real numbers such that lim, a,(x) = 0 
uniformly in x and if {by(x)}n>1 is a sequence of real-valued functions on S$ 
whose partial sums B, (x) = pea by (x) have |B,(x)| < M for some M and all 
n and x, then ye An(x)by(x) converges uniformly. 


PROOF. If n > m, summation by parts gives 


n n—-1 
» Ag (X)bg(X) = DY Be (x) (ak (%) — epi (X)) + Bn(X)an(x) — Bm=1(%)am (x). 


k=m k=m 


Let € > 0 be given, and choose N such that a, (x) < € for all x whenever k > N. 
Ifn >m > N, then 


n—1 


| dP axe ybu)] = [BiG Gx) = ai) + Me + Me 
k=m k=m 


n=l 

< MY (ax(x) — ae41(x)) + 2Me 
k=m 

< Mam(x)+2Me 

<3Me, 


and the uniform convergence is immediate from the uniform Cauchy criterion. 
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A third consequence can be considered as a uniform version of the result that 
absolute convergence implies convergence. In practice it tends to be fairly easy 
to apply, but it applies only in the simplest situations. 


Proposition 1.20 (Weierstrass M test). Let S be a nonempty set, and let { f,,} 
be a sequence of real-valued functions on S such that | f,(x)| < M, for all x in 
S. Suppose that )*>, M, < +00. Then °°, f,,(x) converges uniformly for x in 
S. 


ProoF. If n > m > N,then | yin g1 fe) | < Deen LF GO < hem Mes 
and the right side tends to 0 uniformly in x as N tends to infinity. Therefore the 
result follows from the uniform Cauchy criterion. 


EXAMPLES. 
(1) The series 


[oe) 
i n 
da 
converges uniformly for —1 < x < 1 by the Weierstrass M test with M, = 1/n?. 
(2) The series 


= 
I 


CO 


Ps rth 
ae) n2 


n=1 


converges uniformly for —1 < x < 1, but the M test does not apply. To see 
that the M test does not apply, we use the smallest possible M,, which is M, = 
sup, Ke 1)" xn = mt The series )> me diverges, and hence the M test 
cannot apply for any choice of the numbers M,,. To see the uniform convergence 
of the given series, we observe that the terms strictly alternate in sign. Also, 


x7tn x? 4+(n+1) x? x 1 1 
> because —~ > — —~ and —-> 
n2 (n+ 1)2 n2— (n+ 1)? n n+l 
Finally 
x on EI 
72 72 >0 


uniformly for —1 < x < 1. Hence the series converges uniformly by the uniform 
Leibniz test (Corollary 1.18). 


Having developed some tools for proving uniform convergence, let us apply 
the notion of uniform convergence to interchanges of limits involving functions 
of a real variable. For a point of reference, recall the diagrams of interchanges of 
limits at the beginning of the section. We take the column index to be n and think 
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of the row index as a variable t, which is tending to x. We make assumptions 
that correspond to (i) and (ii) in Proposition 1.16, namely that { f, (t)} converges 
uniformly in ¢ as n tends to infinity, say to f(t), and that f,,(t) converges to some 
limit f,(x) as ¢ tends to x. With f,,(x) defined as this limit, f,, is continuous 
at x. In other words, the assumptions are that the sequence { f,} is uniformly 
convergent to f and each f, is continuous. 


Theorem 1.21. If { f,,} is a sequence of real-valued functions on [a, b] that are 
continuous at x and if { f,} converges to f uniformly, then f is continuous at x. 


REMARKS. This is really a consequence of Proposition 1.16 except that one of 
the indices, namely ¢, is regarded as continuous and not discrete. Actually, there is 
a subtle simplification here, by comparison with Proposition 1.16, in that { f,(x)} 
at the limiting parameter x is being assumed to tend to f(x). This corresponds 
to assuming (b) in the proposition, as well as (i) and (ii). Consequently the proof 
of the theorem will be considerably simpler than the proof of Proposition 1.16. 
In fact, the proof will be our first example of a 3€ proof. In many applications 
of Theorem 1.21, the given sequence { f,,} is continuous at every x, and then the 
conclusion is that f is continuous at every x. 


PROOF. We write 
FO FO =(fO— fel en) = Fol eae) — FO 


Given € > 0, choose N large enough so that | f,,(t) — f (t)| < € for all t whenever 
n > N. With such an n fixed, choose some 6 of continuity for the function 
Jn, the point x, and the number €. Each term above is then < €, and hence 
| f(t) — f(x)| < 3e. Since € is arbitrary, f is continuous at x. 


Theorem 1.21 in effect uses only conclusion (c) of Proposition 1.16, which 
concerns the equality of the two iterated limits. Conclusion (d) gives a stronger 
result, namely that the double limit exists and equals each iterated limit. The 
strengthened version of Theorem 1.21 is as follows. 


Theorem 1.21’. If {f,} is a sequence of real-valued functions on [a, b] that 
are continuous at x and if {f,} converges to f uniformly, then for each € > 0, 
there exist an integer N and a number 6 > 0 such that 

Ifn(t) — f(x)| <e 
whenever n > N and |t — x| <6. 

PRooF. If € > 0 is given, choose N such that | f,(t) — f()| < €/2 for all 
t whenever n > N, and choose 6 in the conclusion of Theorem 1.21 such that 
|t — x| < 6 implies | f(t) — f(x)| < €/2. Then 

lin) — f@1 <1iO —-fOI+IFO —f@1< 5+5=€ 


whenever n > N and |t — x| < 6. Theorem 1.21’ follows. 
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In interpreting our diagrams of interchanges of limits to get at the statement of 
Theorem 1.21, we took the column index to be n and thought of the row index as 
a variable t, which was tending to x. It is instructive to see what happens when 
the roles of n and ¢ are reversed, i.e., when the row index is n and the column 
index is the variable +, which is tending to x. Again we have f,(t) converging 
to f(t) and lim;., fr(t) = f(x), but the uniformity is different. This time 
we want the uniformity to be in n as f tends to x. This means that the 6 of 
continuity that corresponds to € can be taken independent of n. This is the notion 
of “equicontinuity,” and there is a classical theorem about it. The theorem is 
actually stronger than Proposition 1.16 suggests, since the theorem assumes less 
than that f,(t) converges to f(t) for all t. 

Let F = {fy | a € A} be a set of real-valued functions on a bounded interval 
[a, b]. We say that F is equicontinuous at x € [a, b] if for each € > 0, there is 
some 6 > Osuch that |t—x| < d implies | f(t)— f(x)| < € forall f ¢ F. The set 
F of functions is pointwise bounded if for each ¢ € [a, b], there exists a number 
M, such that | f (t)| < M, forall f € F. The set is uniformly equicontinuous on 
[a, b] if it is equicontinuous at each point x and if the 6 can be taken independent 
of x. The set is uniformly bounded on [a, b] if it is pointwise bounded at each 
t € [a, b] and the bound M, can be taken independent of f. 


Theorem 1.22 (Ascoli’s Theorem). If { f,} is a sequence of real-valued func- 
tions on a closed bounded interval [a, b] that is equicontinuous at each point of 
[a, b] and pointwise bounded on [a, b], then 


(a) {f,} is uniformly equicontinuous and uniformly bounded on [a, b], 
(b) {f,} has a uniformly convergent subsequence. 


PROOF. Since each f,, is continuous at each point, we know from Theorems 
1.10 and 1.11 that each f,, is uniformly continuous and bounded. The proof of 
(a) amounts to an argument that the estimates in those theorems can be arranged 
to apply simultaneously for all 7. 

First consider the question of uniform boundedness. Choose, by Theorem 1.11, 
some x, in [a, b] with | fn(xn)| equal to Kn = sup, yg) |fn(x)|. Then choose a 
subsequence on which the numbers K,, tend to sup, Ky in R*. There will be no 
loss of generality in assuming that this subsequence is our whole sequence. Apply 
the Bolzano—Weierstrass Theorem to find a convergent subsequence {X»p,} of {xn}, 
say with limit x9. By pointwise boundedness, find M,, with | f,(%o)| < Mx, for 
all n. Then choose some 6 of equicontinuity at x9 fore = 1. As soonas k is large 
enough so that |x, — xo9| < 6, we have 


Kn, = | fg %n,) | < | fry On,) _ fn, Xo) | oh | fn, (Xo) <1 + M,,. 


Thus 1 + M,, is a uniform bound for the functions fy. 
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The proof of uniform equicontinuity proceeds in the same spirit but takes 


longer to write out. Fix € > 0. The uniform continuity of f,, for each n means 
that it makes sense to define 


6, (€) = min {I sup {° >0 


| f(x) — f(y)| < € whenever |x — y| < 8’ 
and x and y are in the domain of f , 


If |x — y| < 6,(e€), then | f,(x) — fr(y)| < €. Put 6(€) = inf, 6,(€). Let us see 
that itis enough to prove that 5(€) > 0: Ifx and y arein [a, b] with |x—y| < d(e), 
then |x — y| < d(€) < 5,(€). Hence |f,(x) — fr()| < € as required. 

Thus we are to prove that d(€) > 0. If d(€) = 0, then we first choose an 
increasing sequence {n;} of positive integers such that 5,,(€) < +, and we next 
choose x, and y, in [a, b] with |x, — yg| < dn, (€) and | fern) — frOe)| = €- 
Applying the Bolzano—Weierstrass Theorem, we obtain a subsequence {x,,} of 
{x,} such that {x,,} converges, say to x9. Then 


lim sup | yg, — Xo| < lim sup | yg, — xx,| + lim sup |x;, — x9o| =O +0 =0, 
i i 1 


so that {y,z,} converges to x9. Now choose, by equicontinuity at x9, a number 
6’ > 0 such that | f,(x) — fn(xo)| < 5 for all n whenever |x — xo| < 6’. The 
convergence of {x;,} and {y,,} to x9 implies that for large enough /, we have 
xx, — Xo] < 6’/2 and |yz, — xo| < 5'/2. Therefore | fi, (%x,) — fx, (%0)| < 5 and 
| fer Oe) — Sky (X0)| < 5» from which we conclude that | fx, (xx) — fe, Ye) | < €- 
But we saw that | f;(x,) — f,%)| = € for all k, and thus we have arrived at a 
contradiction. This proves the uniform equicontinuity and completes the proof 
of (a). 

To prove (b), we first construct a subsequence of { f,} that is convergent at 
every rational point in [a, b]. We enumerate the rationals, say as x1, x2,... . By 
the Bolzano—Weierstrass Theorem and the pointwise boundedness, we can find 
a subsequence of {f,,} that is convergent at x,, a subsequence of the result that 
is convergent at x2, a subsequence of the result that is convergent at x3, and so 
on. The trouble with this process is that each term of our original sequence may 
disappear at some stage, and then we are left with no terms that address all the 
rationals. The trick is to form the subsequence {f,,,} of the given {f,,} whose 
k'® term is the k" term of the k" subsequence we constructed. Then the k", 
(k +1), (k +2)™4,... terms of { Sn, } all lie in our k'® constructed subsequence, 
and hence { f,,} converges at the first k points x,,...,x,. Since k is arbitrary, 
{ fn, } converges at every rational point. 

Let us prove that {f,,,} is uniformly Cauchy. Let € > 0 be given, let 5 be 
some corresponding number exhibiting equicontinuity, and choose finitely many 
rationals 7;,...,7; in [a,b] such that any member of [a, b] is within 6 of at 
least one of these rationals. Then choose N such that | f, (rj) — fm(rj)| < € for 
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1 < j < / whenever n and m are > N. If x is in [a, b], let r(x) be an r; with 
|x — r(x)| < 6. Whenever n and m are > N, we then have 


| fn(x) — fn(x)I 
S| fn) — fa @)) + Ifa) — fin OD) + fin ©) = fin 0) | 
<etet+e=3e. 


Hence { fn, } is uniformly Cauchy, and (b) follows from Proposition 1.17. 


REMARK. The construction of the subsequence for which countably many 
convergence conditions were all satisfied is an important one and is often referred 
to as a diagonal process or as the Cantor diagonal process. 


EXAMPLE. Let K and M be positive constants, and let F be the set of con- 
tinuous real-valued functions f on [a,b] such that | f(t)| < K fora <t <b 
and such that the derivative f’(t) exists fora < t < band satisfies | f’(t)| < M 
there. This set of functions is certainly uniformly bounded by K, and we show 
that it is also uniformly equicontinuous. To see the latter, we use the Mean Value 
Theorem. If x is in the closed interval [a, b] and ¢ is in the open interval (a, b), 
then there exists € depending on ¢ and x such that 


If) — f@) =1f' @llt — xl s Mt — x\. 


From this inequality it follows that the number 6 of uniform equicontinuity for 
e and F can be taken to be €/M. The hypotheses of Ascoli’s Theorem are 
satisfied, and it follows that any sequence of functions in F¥ has a uniformly 
convergent subsequence. The estimate of 6 is independent of the uniform bound 
K, yet Ascoli’s Theorem breaks down if there is no bound at all; for example, the 
sequence of constant functions with f,(x) = n is uniformly equicontinuous but 
has no convergent subsequence. 


We turn now to the problem of interchange of derivative and limit. The two 
indices again will be an integer n that is tending to infinity and a parameter ¢ that 
is tending to x. Proposition 1.16 takes away all the surprise in the statement of 
the theorem, and it tells us the steps to follow in a proof. What the proposition 
suggests is that the general entry in our interchange diagram should be whatever 
quantity we want to take an iterated limit of in either order. Thus we expect not a 
theorem about a general entry /,(¢), but instead a theorem about a general entry 


t)—- t)—- 
a The limit on 1 gives us fonrr®) for a limiting function f, 
—Xx —x 
and then the limit as t > x gives us f(x). In the other order the limit as t > x 


gives us f/ (x), and then we are to consider the limit on n. If Proposition 1.16 is 
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to be a guide, we are to assume that the convergence in one variable is uniform 
in the other. The proposition also suggests that if we have existence of each row 
limit and each column limit, then uniform convergence when one variable occurs 
first is equivalent to uniform convergence when the other variable occurs first. 
Thus we should assume whichever is easier to verify. 


Theorem 1.23. Suppose that {/,,} is a sequence of real-valued functions 
continuous fora < t < b and differentiable for a < t < b such that {f/} 
converges uniformly fora < t < band { f,(xo)} converges in R for some xo with 
a < xo < b. Then {f,} converges uniformly fora < t < b toa function f, and 
f'(x) = lim, f/(x) for a < x <b, with the derivative and the limit existing. 


REMARKS. The convergence of {f(xo)} cannot be dropped completely as a 
hypothesis because f,(f) = n would otherwise provide a counterexample. In 
practice, { f,} will be known in advance to be uniformly convergent. However, 
uniform convergence of { f;,} is not enough by itself, as was shown by the example 


sinnx , ; 
fn(X) = in Section 2. 


PROOF. The first step is to apply the Mean Value Theorem to f,, — fi, estimate 
f, — fj,, and use the convergence of { f, (xo)} to obtain the existence of the limit 
function f. The Mean Value Theorem produces some & strictly between ¢ and 
Xo such that 


u(t) — f(t) = (fa (%0) — fn (x0) + (t — x0)( Fn (E) — fn (€)). 


Our hypotheses allow us to conclude that { f,,(¢)} is uniformly Cauchy, and thus 
{ fn} converges uniformly to a limit function f by Proposition 1.17. 
The second step is to apply the Mean Value Theorem again to f, — fin, this 


time to see that 
Fn(t) — fa) 


Qn(t) = 2 


converges uniformly in t (for t 4 x) as n tends to infinity, the limit being g(t) = 
f(t) — f(x) 


: . In fact, the Mean Value Theorem produces some & strictly between 
—x 
t and x such that 

alt) — g(t) = Ee = Su N= 1H) = Sm Ol pregy preg, 


t—x 


and the right side tends to 0 uniformly as n and m tend to infinity. Therefore 
{@n(t)} is uniformly Cauchy for t # x, and Proposition 1.17 shows that it is 
uniformly convergent. 
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The third step is to extend the definition of yg to x by y,(x) = f/(x) and 
then to see that g, is continuous at x and Theorem 1.21 applies. In fact, the 
definition of g,(t) is as the difference quotient for the derivative of f,, at x, and 
thus g(t) > f(x) = @n(x). Hence gp is continuous at x. We saw in the 
second step that g,(t) is uniformly convergent for t ~# x, and we are given that 
Gn(x) = f(x) is convergent. Therefore gy, (t) is uniformly convergent for all t 
with 


POAT). for t ZX, 
lim p(t) = Uma 
lim f/ (x) fort =x. 


Theorem 1.21 says that the limiting function lim g,(t) is continuous at x. Thus 


i ft) — f@) 
im — = 


t>x t—x 


lim fi (x). 


In other words, f is differentiable at x and f’(x) = lim, f/ (x). 


4. Riemann Integral 


This section contains a careful but limited development of the Riemann integral 
in one variable. The reader is assumed to have a familiarity with Riemann sums 
at the level of a calculus course. The objective in this section is to prove that 
bounded functions with only finitely many discontinuities are Riemann inte- 
grable, to address the interchange-of-limits problem that arises with a sequence 
of functions and an integration, to prove the Fundamental Theorem of Calculus 
in the case of continuous integrand, to prove a change-of-variables formula, and 
to relate Riemann integrals to general Riemann sums. The Riemann integral in 
several variables will be treated in Chapter III, and some of the theorems to be 
proved in the several-variable case at that time will be results that have not been 
proved here in the one-variable case. In Chapters VI and VII, in the context 
of the Lebesgue integral, we shall prove a much more sweeping version of the 
Fundamental Theorem of Calculus. 

First we give the relevant definitions. We work with a function f : [a,b] ~ R 
with a < b in R, and we always assume that f is bounded. A partition P of 
[a, b] is a subdivision of the interval [a, b] into subintervals, and we write sucha 
partition as 

aA=x <x, <::: <x, =b. 


The points x; will be called the subdivision points of the partition, and we may 
abbreviate the partition as P = {x;}/_). In order to permit integration over an 
interval of zero length, we allow partitions in which two consecutive x;’s are 
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equal; the multiplicity of x; is the number of times that x; occurs in the partition. 
For the above partition, let 


Ax; = Xj — Xi-1, i(P) = mesh of P = max Ax;, 
U 
M;= sup f(x), m= inf f(x). 
Xj-1 SN SXj MISTS 


Put 


n 
U(P, fy = si M; Ax; = upper Riemann sum for P, 


i=1 


n 
L(P, fp = So mj Ax; = lower Riemann sum for P, 
i=l 


—=)b 
/ fdx= inf U(P, f) = upper Riemann integral of f, 


b 
/ f dx = sup L(P, f) = lower Riemann integral of ff. 
pelee | P 


We say that f is Riemann integrable on [a, ] if [, ; ache ae : f dx, and in 
this case we write fe f dx for the common value of these two numbers. We write 
Ra, b] for the set of Riemann integrable functions on [a, b]. 

If f > 0, an upper Riemann sum for f may be visualized in the traditional 
way as the sum of the areas of rectangles with bases [x;_1, x;] and with heights 
just sufficient to rise above the graph of f on the interval [x;_1, x;], and a lower 
sum may be visualized similarly, using rectangles as large as possible so that they 
lie below the graph. 


EXAMPLES. 

(1) Suppose f(x) = c fora < x < b. No matter what partition P is used, 
we have M; = c and m; = c. Therefore U(P, f) = L(P, f) = c(b—-a), 
c oe dx = ae dx = c(b—a), and f is Riemann integrable on [a, b] with 
i fdx =c(b—a). 

(2) Let [a, b] be arbitrary with a < b, and let f be 1 on the rationals and 0 on 


the irrationals. This f is discontinuous at every point of [a, b]. No matter what 
partition is used, we have M; = 1 and m; = O whenever Ax; > 0. Therefore 


—b 
U(P, f) =b—aand L(P, f) =0. Hence f, f dx =b—aand [’ f dx =0, 
and f is not Riemann integrable. ‘ 
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Let us work toward a proof that continuous functions are Riemann integrable. 
We shall use some elementary properties of upper and lower Riemann sums along 
with Theorem 1.10, which says that a continuous function on [a, b] is uniformly 
continuous. 


Lemma 1.24. Suppose that f : [a,b] — Rhasm < f(x) < M forall x in 
[a, b]. Then 


m(b—a) < L(P, f) < U(P, f) < M(b— a), 


b 
mba) f fdx <Mb—a) 


—b 
mba) = | fdx < M(b—-a). 


PROOF. The first conclusion follows from the computation 
n n 
m(b —a) =) mAx; < L(P, f) =) mjAxi 
i=l i=! 


<)> Mj Ax; = U(P, f) < )| MAx; = Mb — a). 
i=1 i=1 


i= 


If we concentrate on the first, third, and last members of the above inequalities 
and take the supremum on P, then we obtain the second conclusion. Similarly if 
we concentrate on the first, sixth, and last members of the above inequalities and 
take the infimum on P, then we obtain the third conclusion. 


A refinement of the partition P is a partition P* containing all the subdivision 
points of P, with at least their same multiplicities. If Pj and P2 are two parti- 
tions, then P; and P> have at least one common refinement: one such common 
refinement is obtained by taking the union of the subdivision points from each 
and repeating each such point with the maximum of the multiplicities with which 
it occurs in P; and P2. We use this notion in order to prove a second lemma. 


Lemma 1.25. Let f : [a,b] — R satisfy m < f(x) < M for all x in [a, b]. 
Then 
(a) L(P, f) < L(P*, f) and U(P*, f) < UCP, f) whenever P is a parti- 
tion of [a, b] and P* is a refinement, 
(b) L(P,, f) < U(Po, f) whenever P; and P) are partitions of [a, b], 


(©) fofdx<f, fax, 
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(d) [, fdx—ffdx <(M—m)(b—a), 
(e) the function f is Riemann integrable on [a, b] if and only if for each 
€ > 0, there exists a partition P with U(P, f) — L(P, f) <e. 


PROOF. In (a), it is enough to handle the case in which P* is obtained from P 
by including one additional point, say x* between x;_; and x;. The only possible 
difference between L(P, f) and L(P*, f) comes from [x;_1, x;], and there we 
have 


inf f@) Girma) = int PO) Cima") + int fx) —a1-1) 


XELXj 1.45 Xj-1,Xi €[Xi-1 Xi 
< inf f(x)Qi—x*)+ inf f(x) @*—x;-1). 
xeE[xj-1,x*] xe[x*,x;] 
Hence L(P, f) < L(P*, f), and similarly U(P*, f) < U(P, f). This proves 
(a). 
Let P* be a common refinement of P; and P2. Combining (a) with Lemma 
1.24 gives 
LPS f) SE se FV SU Ce a) SU fs 


This proves (b). Conclusion (c) follows by taking the supremum on P; and 
then the infimum on P>, and conclusion (d) follows by subtracting the second 
conclusion of Lemma 1.24 from the third. 

For (e), we have 


b —b 
Lip. fis | fax = f fdx <U(Ps, f) 


for any partitions P; and P» of [a, b]. Riemann integrability means that the center 
two members of this inequality are equal. If they are not equal, then there certainly 
can exist no P with U(P, f) —L(P, f) <e€ife =f Pay =f fax. On the 
other hand, equality of the center two members, together with the definitions of 
the lower and upper Riemann integrals, means that for each e > 0, we can choose 
P, and P, with U(P3, f) — L(P, f) < €. Letting P be a common refinement 
of P; and P, and applying (a), we see that U(P, f) — L(P, f) < €. This proves 
(e). 


Theorem 1.26. If f : [a,b] — R is continuous on [a, b], then f is Riemann 
integrable on [a, b]. 


PROOF. From Theorem 1.10 we know that f is uniformly continuous on [a, b]. 
Givene > 0,wecan therefore choose some number 6 > 0 corresponding to f and 
€ on [a, b]. Let P = {x;}/_p) be a partition on [a, b] of mesh w(P) < 5. On any 
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subinterval [x;_1, x;] corresponding to P, we have m; = f (&) and M; = f(n;) 
for some &; and n; in [x;-1, x;], by Theorem 1.11. Since |n; — &;| < |x; —x;-1| = 
Ax; < U(P) < 6, we obtain M; —m; = f(y) — f(&) < €. Therefore 


U(P, f) — L(P, f) = 9 (Mi — mj) Ax; < € )) Ax; = €(b—a), 
i=1 


i=l 
and the theorem follows from Lemma 1.25e. 


We shall improve upon Theorem 1.26 by allowing finitely many points of 
discontinuity, but we need to do some additional work beforehand. 


—b 
Lemma 1.27. If f is bounded on [a,b] anda < c < b, then Ve fdx = 


se 8b 
J, fax+f. fdx, and similarly for ” Consequently f is in R[a, b] if and 
only if f isin both R[a, c] and R[c, b], and in this case, 


[sara fi sacs [ras 


REMARKS. After one is done developing the Riemann integral and its prop- 
erties, it is customary to adopt the convention that Hie fdx => ee f dx when 
b <a. One of the places that this convention is particularly helpful is in applying 
the displayed formula of Lemma 1.27: the formula is then valid for all real a, b, c 
without the assumption that a, b, c are ordered in a particular way. 


PRooF. If P; and P» are partitions of [a,c] and [c, b], respectively, let P be 
their “union,” which is obtained by using all the subdivision points 4 c of each 
partition, together with c itself. The multiplicity of c in P is to be the larger of 
the numbers of times c occurs in P; and P;. This P is a partition of [a, b]. Then 


—b 
/ fdx <U(P, f) =U(Pi, f) + U(Py, f). 


Taking the infimum over P; and then the infimum over P,, we obtain 


[ra ef tas+f tas 


For the reverse inequality, let € > O be given, and choose a partition P of 


—b 
[a, b] with U(P, f) — ‘is f dx <. Let P* be the refinement of P obtained by 
adjoining c to P if c is not a subdivision point of P or by using P itself if c isa 
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subdivision point of P. Lemma 1.25a gives U(P*, f) — J, "a dx < €. Because 
c is a Subdivision point of P*, the subdivision points < c give us a partition P; of 
[a, c] and the subdivision points > c give us a partition P, of [c, b]. Moreover, 
P* is the union of P, and P;. Then we have 


—hb —c —b 
[ fartezuernn=ve.nturne | fast f fax. 


a c 


Since € is arbitrary, the lemma follows. 


Lemma 1.28. Suppose that f : [a,b] — R is bounded on [a, b] and that 
a<c<b.Ifforeach6é > 0, f is Riemann integrable on each closed subinterval 
of [a, b] — {x | |x —c| => 5}, then f is Riemann integrable on [a, }]. 


PROOF. We give the argument when a <c < b, the cases c = a andc = b 
being handled similarly. Since f is by assumption bounded, find m and M 
with m < f(x) < M for all x € [a,b]. Choose 6 > 0 small enough so that 
a<c—6<c<c+é6 <b. To simplify the notation, let us drop “ f dx” from 
all integrals. Since f is by assumption Riemann integrable on [a,c — 6] and 
[c + 6, b], Lemma 1.27 gives 


c—5 c+é b 65 ee O68 b 
eS ee Ps is ag 
a c—5 c+é Ya c—6 Y c+é 
c—6 c+é b 

if +(f + 28(M m)) + | 
“a Y c—6 ae 


Cc 


b 


| 


lA 


c+é6 


b 
/ + 28(M —m). 


—a 
. F : wana b 
Since 6 is arbitrary, {, = {”. The lemma follows. 
- a 


Proposition 1.29. If f : [a,b] — R is bounded on [a, b] and is continuous 
at all but finitely many points of [a, b], then f is Riemann integrable on [a, b]. 


REMARK. There is no assumption that f has only jump discontinuities. For 
example, the proposition applies if [a,b] = [0,1] and f is the function with 
f(x) =sin+ for x #0 and f(0) =0. 


PROOF. By Lemma 1.27 and induction, it is enough to handle the case that f is 
discontinuous at exactly one point, say c. Since f is bounded and is continuous 
at all points but c, Theorem 1.26 shows that the hypotheses of Lemma 1.28 are 
satisfied. Therefore Lemma 1.28 shows that f is Riemann integrable on [a, D]. 
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We shall now work toward a theorem about interchanging limits and integrals. 
The preliminary step is to obtain some simple properties of Riemann integrals. 


Proposition 1.30. If f, f1, and f2 are Riemann integrable on [a, b], then 
(a) fi + frisin Ria, bland f? (fi+ forydx =f? fidxt f? fodx, 
(b) cf isin R[a, b] and he cf dx = cf? f dx for any real number c, 


(c) fi < foonfa, bl implies [7 fidx < fy frdx, 
(d) m < f < Mon[a,b] and qg: [m, M] — Rcontinuous imply that g o f 
is in R[a, bd], 
(e) |flisin Ra, bl, and | [? fdx| < [? \fldx, 
(f) f? and f; fz are in R[a, b], 
(g) /f isin R[a, b] if f > 0on [a, bd], 
(h) the function g with g(x) = f(—x) is in R[—b, —a] and satisfies 
‘fa SO i f dx. 
REMARK. The proof of (c) will show, even without the assumption of Riemann 


. re woe > b b 

integrability,that {| fidx < f, fodx and’ f\dx < f° fy dx. Weshall make 
=a -—4a 

use of this stronger conclusion later in this section. 


PROOF. For (a), write f = fi + fo, and let P be a partition. From 


inf I+ 


XE[Xj—-1,%X; 


nf < 
2@) S et 


1 
EL -15%5 


af (fi + AO) = 


1 
€Li-1, 


in te) 


and a similar inequality with the supremum, we obtain 
EUR fi) +L J) SP) See Sys OP sf POC ie 0) 
Let € > 0 be given. By Lemma 1.25e, choose P; and P2 with 
U(Pi, fi)—L(Pi, fi)<€ and U(P2, fo) — L(P2, fo) <e. 
If P isa common refinement of P; and P2, then Lemma 1.25a gives 
U(P, fi) -L(P, fi) <€ and U(P, fo) — L(P, fr) <€. 
Hence 


b 
UCP, fi) < | cd een Pe, 


b 
U(P, fr) =] fodx +é€ <L(P, fo) +2¢, 
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and (x) yields U(P, f) — L(P, f) < 4e. Since € is arbitrary, Lemma 1.25e 
shows that f is in R[a, b]. From the inequalities for U(P, f,) and U(P, fr), 
combined with the last inequality in (*), we see that 


b b b 
[ favcue. neue. pytue.ts | fidx + f fodx + 2¢, 


while the first inequality in () shows that 


b b 
[ fact f fodx + 2€ < L(P, fi)+ LWP, fo) + 4e 
b 
SL(P. f)+4es f f dx +4e. 


Since ¢€ is arbitrary, we obtain ie (fi + fo)dx = i fidx + i fodx. This 
proves (a). 

For (b), consider any subinterval [x;_1, x;] of a partition, and let m; and M; be 
the infimum and supremum of f on this subinterval. Also, let m’ and M/ be the 
infimum and supremum of cf on this subinterval. If c > 0, then M; = cM; and 
m’, = cm;, so that (M! — m)) Ax; = c(M; — mj) Ax;. If c < 0, then Mj = —cm; 
andm;, = —cM,;,so that we still have (M;—m') Ax; = c(M;—m;)Ax;. Summing 
oni, we obtain U(P, cf) —L(P, cf) = c(U(P, f) —L(P, f)), and (b) follows 
from Lemma 1.25e. _ ‘ 

For (c), we have as fidx < U(P, fi) < UCP, fz) for all P. Taking the 


—b 
infimum on P in the inequality of the first and third members gives [, fi dx < 


J, e dx. (Similarly foe dx < Lf dx, but this is not needed under the 
hypothesis that f, and f) are Riemann integrable.) 

For (d), let K = sup, cpm_mj |p (t)|. Lete > 0 be given, and choose by Theorem 
1.10 some 6 of uniform continuity for g and €. Without loss of generality, we 
may assume that 6 < €. By Lemma 1.25e, choose a partition P = {x;}?_) of 
[a, b] such that U(P, f) — L(P, f) < &. On any subinterval [x;-1, xi] of P, 
let m; and M; be the infimum and supremum of f, and let m and M; be the 
infimum and supremum of g o f. Divide the set of integers {1,...,} into two 
subsets—the subset A of integers i with M; —m; < 6 and the subset B of integers 
i with M; — m; > 6. Ifi is in A, then the definition of 6 makes M/ — mi < e. If 
i is in B, then the best we can say is that M; — m’ < 2K. However, on B we do 
have M; — m; > 6, and thus 


5 Ax <)> (Mi —mj) Ax; <)> (Mi —m) Ax; = UP, f)—-L(P, f) < &. 


icB icB i=l 
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Thus )°;., Ax; < 6 and 


U(P, go f) -L(P,go f) = >) (Mj —m))Ax; +) (Mj = mi) Ax; 


icA ic¢B 
<€ >) Axi; +2K D> Ax; 
icA ic¢B 


<€(b—a)+2K6 <€(b—a)+2Ke. 


Since € is arbitrary, the Riemann integrability of go f follows from Lemma 1.25e. 

For (e), the first conclusion follows from (d) with g(t) = |t|. For the asserted 
inequality we have f < |f| and —f < |f|, so that (c) and (b) give fe fdx< 
ie | f|dx and — ii fares ye | f| dx. Combining these inequalities, we obtain 
[Se fdx| < {0 \flax. 

For (f), the first conclusion follows from (d) with y(t) = t?. For the Riemann 
integrability of | f2, we use the formula f| fo = (fi + fo)? — te = i) and 
the earlier parts of the proposition. 

Conclusion (g) follows from (d) with g(t) = Jf. 

For (h), each partition P of [a, b] yields a natural partition P’ of [—b, —a] by 
using the negatives of the partition points. When P and P’ are matched in this way, 
U(P, f) =U(P’, g) and L(P, f) = L(P’, g). It is immediate that f € R[a, b] 
implies g € R[—b, —a] and that [";" gdx = th f dx. This completes the proof. 


The next topic is the problem of interchange of integral and limit. 


EXAMPLE. On the interval [0, 1], define f,,(x) to be n forO < x < 1/n and to 
be 0 otherwise. Proposition 1.29 shows that f, is Riemann integrable, and Lemma 
1.27 allows us to see that i fn dx = 1foralln. Onthe other hand, lim, f,(x) = 0 
for all x € [0, 1]. Since [i Odx = 0, we have ifs Feet = 1 AVS lime fds. 
Thus the interchange is not justified without some additional hypothesis. 


Theorem 1.31. If { /,} is asequence of Riemann integrable functions on [a, b] 
and if { f,} converges uniformly to f on [a, b], then f is Riemann integrable on 
[a, b],and lim, [? frdx = [? f dx. 


REMARKS. Proposition 1.16 suggests considering a “matrix” whose entries 
are the quantities for which we are computing iterated limits, and these quantities 
are U(P, f,) here. (Alternatively, we could use L(P, f,).) The hypothesis of 
uniformity in the statement of Theorem 1.31, however, concerns f,,notU(P, f,). 
In fact, the tidy hypothesis on f, in the statement of the theorem implies a less 
intuitive hypothesis on U (P, f,,) that has not been considered. The proof conceals 
these details. 
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PRooF. Using the uniform Cauchy criterion with € = 1, we see that there 
exists N such that | f,(x)| < My + 1 for all x whenever n > N. It follows 
from the boundedness of fi,..., f—, that the | f,,| are uniformly bounded, say 
by M. Then also | f(x)| < M for all x. Put e, = sup, |fn(x) — f(x)|, so that 
tn —€n < f < fa + €n. Proposition 1.30c and the remark with the proposition, 
combined with Lemma 1.25c, then yield 


b b = > b 
[ e-endes [ save | fax < [ (fn + €n) dx. 


= b 
Hence J, fdx—f e f dx < 2&n(b — a) for all n. The uniform convergence of 


{fn} to f forces €, to tend to 0, and thus f isin Ra, b]. The displayed equation, 
in light of the Riemann integrability of f, shows that 


b b 
[| fax— [ fndx| <2eq(b — a). 


The right side tends to 0, and therefore lim, fe par i f dx. 


EXAMPLE. Let f : [0, 1] — R be defined by 


1/q if x is the rational p/q in lowest terms 


0 if x is irrational. 


roy ={ 


This function is discontinuous at every rational and is continuous at every irra- 
tional. Its Riemann integrability is not settled by Proposition 1.29. Define 


1/q if x is the rational p/q in lowest terms,g <n 
In(x) = + 0 if x is the rational p/q in lowest terms, q > n 
0 if x is irrational. 


Proposition 1.29 shows that f,, is Riemann integrable, and Lemma 1.27 shows that 
fo tn dx =0. Since | fn(x) — f(x)| < 1/n for all x, {f,} converges uniformly 
to f. By Theorem 1.31, f is Riemann integrable and if f dx =0. 


Theorem 1.32 (Fundamental Theorem of Calculus). If f : [a,b] — R is 
continuous, then 
(a) the function G(x) = ic f dt is differentiable fora < x < b with 
derivative f(x), and it is continuous at a and b with G(a) = 0, 
(b) any continuous function F on [a, b] that is differentiable fora < x <b 
with derivative f(x) has [? f dt = F(b) — F(a). 


36 I. Theory of Calculus in One Real Variable 


REMARK. The derivative of G(x) on (a, b), namely f (x), has the finite limits 
f (a) and f(b) at the endpoints of the interval, since f has been assumed to be 
continuous on [a, b]. Thus, in the sense of the last paragraph of Section A2 of the 
appendix, G(x) has the continuous derivative f(x) on the closed interval [a, b]. 


PROOF OF (a). Riemann integrability of f is known from Theorem 1.26. For 
h > 0 small enough to make x + h < b, Lemma 1.27 and Proposition 1.30 give 


x+h _ x 
sea ESD fo) = 22 ee fo fat fle) 
1 x+h 
=7 / fat — f(s) 
1 x+h 
sal [f(t) — f(x) dt 


G(x +h) — G(x) 
h 


1 x+h 
fo) s> | |f(t) — f(x)| dt. 


and hence | 


If € > 0 is given, choose the 6 of continuity for f and € atx. Then0 <h <6 
implies that the right side is < €. For negative h, we instead take h > O and 
consider 


G(x —h)-—G 1 oe 
BoB =EO payet f far—peoet f r@-soolar 
=u x—h xh 
Then 
G(x —h)-G a i 
(x — h) — G(x) Fe f(t) — f@)|dt <e, 
—h |h| x—h 


as required. 


PROOF OF (b). The functions F and G are two continuous functions on [a, b] 
with equal derivative on (a, b). A corollary of the Mean Value Theorem stated 
in Section A2 of the appendix implies G = F + c for some constant c. Then 


b 
/ f dt = G(b) —0 = G(b) — G(a) = F(b) +c — F(a) —c= F(b)- Fa). 


Corollary 1.33 (integration by parts). Let f and g be real-valued functions 
defined and having a continuous derivative on [a, b]. Then 


b b b 
[ feos'@ar= [resco] - [ femecrar. 


REMARK. The notion of a continuous derivative at the endpoints of an interval 
is discussed in the last paragraph of Section A2 of the appendix. 
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PROOF. We start from the product rule for differentiation, namely 
d / / 
Ae Mee) = f(x)g (x) + fi @)g), 


and we apply fe to both sides. Taking Theorem 1.32 into account, we obtain the 
desired formula. 


Theorem 1.34 (change-of-variables formula). Let f be Riemann integrable 
on [a, b], let g be acontinuous strictly increasing function from an interval [A, B] 
onto [a, b], suppose that the inverse function gy |: [a, b] > [A, B]iscontinuous, 
and suppose finally that ¢ is differentiable on (A, B) with uniformly continuous 


derivative yg’. Then the product (f o g)g’ is Riemann integrable on [A, B], and 


b B 
/ f(x) dx =i) f(O)¢'(y) dy. 
a A 


REMARKS. The uniform continuity of yg’ forces gy’ to be bounded. If g’ were 
also assumed positive on (A, B), then the continuity of yg} on (a, b) would be 
automatic as a consequence of the proposition in Section A3 of the appendix. 
The result in the appendix is not quite good enough for current purposes, and thus 
we have assumed the continuity of vy ' on [a, b]. It will be seen in Section II.7 
that the continuity of gy ' on [a, b] is automatic in the statement of Theorem 1.34 
and need not be assumed. 


PROOF IF f > 0. Givene > 0, choose some 7 of uniform continuity for gy’ and 
€, and then choose, by Theorem 1.10, some 6 of uniform continuity for g~! and n. 
Next choose a partition P = {x;}"_) on [a, b] such that U(P, f) —L(P, f) <e. 
Possibly by passing to a refinement of P, we may assume that w(P) < 5. Let Q 
be the partition {y;}?_) of [A, B] with y; = gy '(x;). Then 4(Q) < 7. 

The Mean Value Theorem gives Ax; = (Ay;)g’(&;) for some & between y;_ 
and y;. On[A, B], gy’ is bounded; let m¥ and M; be the infimum and supremum 
of g’ on [yi-1, yi], so that m* < g'(&) < M* and m¥Ay,; < Ax; < M*Ay,. 
Since 4(Q) < 7, we have M* — m* < €. Then we have 


Y= Mim} Ay; < )\ Mj Ax; = UCP, f) = > Mig! &) Ay; < Mi M'Ayi. 


Whenever F and G are > 0 on a common domain and x is in that domain, 
(inf G)F (x) < G(x) F (x); taking the supremum of both sides gives the inequality 
(inf G)(sup F) < sup(FG). Also, sup(FG) < sup(F) sup(G). Applying these 
inequalities with G = g’ and F = f og yields 


S| Mim? Ay; < U(Q, (f 0 9)g') S MiM? Ayy. 
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Subtraction of the right-hand inequality of the first display and the left-hand 
inequality of the second display shows that 


U(P, f) UO, (fF og)e) < >) Mi(MF — m3) Ay;, 


while subtraction of the right-hand inequality of the second display and the left- 
hand inequality of the first display gives 


UO, (f og)g’) — UP, f) < D> MMF — mi) Ayy. 
Therefore 


|UCP, f) UO, (f o)g¢')| < D> Mi(Mj — mi) Ay; < «M(B — A). 


Similarly IL(P, f) — L(Q, (f 0 ¢)g')| < «M(B — A), 
and hence 


[U(O, (f 0 g)¢') —L(Q, (f° g)9’)| 
<|U(Q, (f o)g') — UCP, f)|+|U(P, f) -— L(P, f)| 
+|L(P, f) — L(Q, (f 0 g)¢’)| 
< 2eM(B— A) +e. 


Since ¢€ is arbitrary, Lemma 1.25e shows that (f o yg)’ is in R[A, B]. Our 
inequalities imply that 
We f dx —U(P, f)| <6 
|U(P, f) —U(Q, (f og)g’)| < «M(B — A), 
and IU(Q, (fo g)e') — fe (fF 0g)! dy| < 2eM(B— A). 


Addition shows that | iM fdx— We (f ogg’ dy| < «€+3eM(B — A). Since € 
is arbitrary, i fdx= ee (f og)’ dy. 


PROOF FOR GENERAL f. The special case just proved shows that the result 
holds for f + c for a suitable positive constant c, as well as for the constant 
function c. Subtracting the results for f +c and c gives the result for f, and the 
proof is complete. 
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If f is Riemann integrable on [a, b],thenU(P, f) and L(P, f)tendto th f dx 
as P gets finer by insertion of points. This conclusion tells us nothing about fine- 
looking partitions like those that are equally spaced with many subdivisions. The 
next theorem says that the approximating sums tend to is f dx just under the 
assumption that w(P) tends to 0. 

Relative to our standard partition P = {x;}/_, let 4; for 1 < 7 <n satisfy 
Xj_1 < t; < x;, and define 


SPD = >) f Gam 
i=1 


This is called a Riemann sum of f. 


Theorem 1.35. If f is Riemann integrable on [a, b], then 


b 
aim sta f= | fax. 


Conversely if f is bounded on [a, b] and if there exists a real number 7 such 
that for any € > 0, there exists some 6 > 0 for which |S(P, {t;}, f) —r| < € 
whenever jz(P) < 6, then f is Riemann integrable on [a, b]. 


PROOF. For the direct part the function f is assumed bounded; suppose 
| f(x)| < M on [a, b]. Let € > 0 be given. Choose a partition P* of [a, b] with 
U(P*, f) < ih f dx +e. Say P* is a partition into k intervals. Put 6; = 3;,. 
and suppose that P is any partition of [a, b] with w(P) < 6;. In the sum giving 
U(P, f), we divide the terms into two types —those from a subinterval of P that 
does not lie within one subinterval of P* and those from a subinterval of P that 
does lie within one subinterval of P*. 

Each subinterval of P of the first kind has at least one point of P* strictly 
inside it, and the number of such subintervals is therefore < k — 1. Hence the 
sum of the corresponding terms of U(P, f) is 


(k — 1)Me 
ee 

Mk ~ 
The sum of the terms of the second kind is < U(P*, f). Thus 


<(k—-1)Mpu(P)< 


b 
uP. fysetur fis | f dx +2. 


a 


Similarly we can produce 5, such that w(P) < 52 implies 


b 
Lee. p> f f dx —2e. 
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If 6 = min{6,, 52} and u(P) < 4, then 
b b 
/ fdx—2e < L(P, fy = S(P, fy surfs [ f dx +2, 


and hence |S(P, f) — ie f dx| <2e. 

For the converse let € > 0 be given, and choose some 6 as in the statement of the 
theorem. Next choose a partition P = {x;}/_) with |U(P, f)- i iM dx| <€ 
and er dx — L(P, f)| < €; possibly by passing to a refinement of P, we 
may assume without loss of generality that 4(P) < 5. Choosing {#;}?_, suitably 
for the partition P, we can make |U(P, f) — S(P, {t;}, f)| < €. For a possibly 


different choice of the set of intermediate points, say {t/}, we can make 
IS(P, {t;}, f) — LP, f)| < €. Then 


, fax —f?fax|< lure, f-f, fax| +1, f)- SP, I 
+|S(P, tt}, A) — rl +r — SCP, Ef}, PI 
+ IS(P.(1)}, f)— LP, P+ [LP A) — f°? f dz| 


< 6€. 


Since € is arbitrary, the Riemann integrability of f follows from Lemma 1.25e. 


With integration in hand, one could at this point give rigorous definitions of 
the logarithm and exponential functions log x and exp x, as well as rigorous but 
inconvenient definitions of the trigonometric functions sin x, cos x, and tanx. For 
each of these functions we would obtain a formula for the derivative and other 
information. We shall not pursue this approach, but we pause to mention the idea. 
We put logx = aD t~' dt for0 < x < +00 and see that log carries (0, +00) 
one-one onto (—oo, +00). The function log x has derivative 1/x and satisfies the 
functional equation log(xy) = log x +log y. The proposition in Section A3 of the 
appendix shows that the inverse function exp exists, carries (—0o, +00) one-one 
onto (0, +00), is differentiable, and has derivative exp x. The functional equation 
of log translates into the functional equation exp(a + b) = expaexpb for exp, 
and we readily derive as a consequence that exp x = e*, where e = exp 1. For the 
trigonometric functions, the starting points with this approach are the definitions 
arctanx = f> (1+27)~' dt, arcsinx = fy (1 —¢°)~'/? dt, and x = 4arctan 1. 

Instead of using this approach, we shall use power series to define these 
functions and to obtain their expected properties. We do so in Section 7. 
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Complex numbers are taken as known, and their notation and basic properties 
are reviewed in Section A4 of the appendix. The point of the present section is 
to extend some of the results for real-valued functions in earlier sections so that 
they apply also to complex-valued functions. 

The distance between two members z and w of C is defined by d(z, w) = 
|z — w|. This has the properties 


(i) d(z1, Z2) => O with equality if and only if z; = zo, 

(ii) d(z1, 22) = d(Z2, 21), 

(iii) d(Z1, 22) < d(z1, 23) + d(Z3, 22). 
The first two are immediate from the definition, and the third follows from the 
triangle inequality of Section A4 of the appendix with z = z;—z3 and w = z3—Z2. 
For this reason, (iii) is called the triangle inequality also. 

Convergence of a sequence {z,} in C to z has two possible interpretations: 
either {Re z,} converges to Re z and {Im z,,} converges to Im z, or d(z,, Z) con- 
verges to 0 in R. These interpretations come to the same thing because 


max {Re w, Im w} < |w| < V2 max {Re w, Imv}. 


Then it follows that uniform convergence of a sequence of complex-valued 
functions has two equivalent meanings, so does continuity of a complex-valued 
function at a point or everywhere, and so does differentiation of a complex- 
valued function. We readily check that all the results of Section 3, starting with 
Proposition 1.16 and ending with Theorem 1.23, extend to be valid for complex- 
valued functions as well as real-valued functions. 

The one point that requires special note in connection with Section 3 is the 
Mean Value Theorem. This theorem is valid for real-valued functions but not 
for complex-valued functions. It is possible to give an example now if we again 
allow ourselves to use the exponential and trigonometric functions before we 
get to Section 7, where the tools will be available for rigorous definitions. The 
example is f(x) = e’* for x € [0,27]. This function has f(0) = f(27) = 1, 
but the derivative f’(x) = ie'* is never 0. 

The Mean Value Theorem was used in the proof of Theorem 1.23, but the 
failure of the Mean Value Theorem for complex-valued functions causes us no 
problem when we seek to extend Theorem 1.23 to complex-valued functions. 
The reason is that once Theorem 1.23 has been proved for real-valued functions, 
one simply puts together conclusions about the real and imaginary parts. 

Next we examine how the results of Section 4 may be extended to complex- 
valued functions. Upper and lower Riemann sums, of course, make no sense for 
a complex-valued function. It is possible to make sense out of general Riemann 
sums as in Theorem 1.35, but we shall not base a definition on this approach. 
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Instead, we simply define definite integrals of a function f : R > C in terms 
of real and imaginary parts. Define the real and imaginary parts u = Re f and 
v=Imf by f(x) = u(x) + iv(a), and let [? fdx = f?udx +i f? vdx. We 
can then redefine the set ?[a, b] of Riemann integrable functions on [a, b] to 
consist of bounded complex-valued functions on [a, b] whose real and imaginary 
parts are each Riemann integrable. 

Most properties of definite integrals go over to the case of complex-valued 
functions by inspection; there are two properties that deserve some discussion: 

(i) If f isin R[a, b] and c is complex, then cf is in R[a, b] and i cfdx = 
c iN fdx. 
(ii) If f isin Rfa, b], then | f| is in R[a, b] and | [? f dx| < f? |fldx. 

To see (i), write f =u +ivandc=p+iq. Thencf = (r +is)(u+iv) = 
(ru—sv)+i(ru+su). The functions ru —sv andru+su are Riemann integrable 
on [a, b], and hence so is cf. Then 


7 tf dx = ie (ru — sv) dx ele (rv + su) dx 
=r f?udx—s f? vdx tir f? vdx +is f? udx 
=rf? (u + iv) dx +is f° (u + iv) dx =cf? fdx. 


To see (ii), let f be in R[a,b]. Proposition 1.30 shows successively that 
(Re f)? and (Im f)? are in R[a, b], that (Re f)* + (Im f)? = | f |? is in RIa, db], 
and that VIF? = |f| isin R[a, b]. For the inequality with | ee f dx|, choose 
c € C with |c| = 1 such that cf? f dx is real and nonnegative, i.e., equals 
| ie f dx|. Using (i), we obtain (ii) from 


Lf? fdx|=cf? fdx =f? cf dx = [? Re(cf)dx 
< fl \efldx = f? |fldx. 


Finally we observe that Theorem 1.35 extends to complex-valued functions 
jf. The definition of Riemann sum is unchanged, namely S(P, {t)}, f) = 
>or_1 f Gi) Ax;, and the statement of Theorem 1.35 is unchanged except that the 
number r is now allowed to be complex. The direct part of the extended theorem 
follows by applying Theorem 1.35 to the real and imaginary parts of f separately. 
For the converse we use that the inequality |S(P, {t:}, f) —c| < € with c complex 
implies |S(P, {ti}, Re f) — Rec| < € and |S(P, {t;}, Im f) — Imc| < €. Theo- 
rem 1.35 for real-valued functions then shows that Re f and Im f are Riemann 
integrable, and hence so is f. 
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6. Taylor’s Theorem with Integral Remainder 


There are several forms to the remainder term in the one-variable Taylor’s 
Theorem for real-valued functions, and the differences already show up in their 
lowest-order formulations. Let f be given, and, for definiteness, suppose a < x. 
If o(1) denotes a term that tends to 0 as x tends to a, three such lowest-order 
formulas are 


f(x) = fla) +00) if f is merely assumed to be continuous, 
fx) =f@+-af') witha < & < x if f is continuous 
on [a, x] and f’ exists on (a, x), 


fae=f@t+ is f'(t) dt if f and f’ are continuous on [a, x]. 


The first formula follows directly from the definition of continuity, while the sec- 
ond formula restates the Mean Value Theorem and the third formula restates part 
of the Fundamental Theorem of Calculus. The hypotheses of the three formulas 
increase in strength, and so do the conclusions. In practice, Taylor’s Theorem 
is most often used with functions having derivatives of all orders, and then the 
strongest hypothesis is satisfied. Thus we state a general theorem corresponding 
only to the third formula above. It applies to complex-valued functions as well 
as real-valued functions. 


Theorem 1.36 (Taylor’s Theorem). Let n be an integer > 0, let a and x 
be points of R, and let f be a complex-valued function with n + 1 continuous 
derivatives on the closed interval from a to x. Then 


! (n) 
Wea foe Obed ee 


1! n!} 


=a)’ + RG, x), 
where 


1 x 

—| a1 fod ifa <x, 

Ry(a,x) = whe a 

-—f @ai ff OMd  itx <a. 
n! J, 


REMARKS. The notion of a continuous derivative at the endpoints of an interval 
is discussed for real-valued functions in the last paragraph of Section A2 of the 
appendix and extends immediately to complex-valued functions; iteration of this 
definition attaches a meaning to continuous higher-order derivatives on a closed 
interval. Once the convention in the remarks with Lemma 1.27 is adopted, namely 
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that f° fdt = — J" f dt when x < a, the formula for the remainder term 
becomes tidier: 


RG a Oe — 1)" FP) dt, 


with no assumption that a < x. 


PROOF. We give the argument when a < x, the case x < a being handled 
analogously. The proof is by induction onn. For n = 0, the formula is immediate 
from the Fundamental Theorem of Calculus (Theorem 1.32b). Assume that the 
formula holds for n — 1. We apply integration by parts (Corollary 1.33) to the 
remainder term at stage n — 1, obtaining 


i @—9t! par =—* [wo fO], +f Oe — 1)" OM) dt. 


Substitution gives 


Ry-1(@, x) — — DI a (x ry! f(r) dt 
a, [ x—7)" anon +of (x — 1)" f" PM dt 


= < (x —a)" f™(a) + Ra(a, x), 


and the induction is complete. 
7. Power Series and Special Functions 


A power series is an infinite series of the form }°°°.) cnz”. Normally in math- 
ematics, if nothing is said to the contrary, the coefficients c, are assumed to be 
complex and the variable z is allowed to be complex. However, in the context of 
real-variable theory, as when forming derivatives of functions defined on intervals, 
one is interested only in real values of z. In this book the context will generally 
make clear whether the variable is to be regarded as complex or as real. 


of” (O)x” x 


ne source of power series is the “infinite Taylor series 
O f the “infinite Tay] i ; 
n= n 


a function f having derivatives of all orders, with the remainder terms discarded. 
In this case the variable is to be real. If the series is convergent at x, the series 
has sum f (x) if and only if lim, R,(O, x) = 0. Later in this section, we shall see 
examples where the limit is identically 0 and where it is nowhere O for x 4 0. 
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Theorem 1.37. If a power series )~*° ) c,z” is convergent in C for some com- 
plex zo with |zo| = R and if R’ < R, then See |Cnz"| is uniformly convergent 
for complex z with |z| < R’, and so is banner (n + 1)|en412z" |. 


REMARKS. The number 
R =<sup {R’ | yo oCnz” converges for some zo with |zo| = R'} 


is called the radius of convergence of °°?) cnz". The theorem says that if 
R’ < R, then ee |cnz" | converges uniformly for |z| < R’, and it follows from 
the uniform Cauchy criterion that )°>°.9 cnz” converges uniformly for |z| < R’. 
The definition of R carries with it the implication that if zo has |zo| > R, then 
yng CnZy diverges. 

PROOF. The theorem is vacuous unless R > 0. Since )°”? 9 Cnz4 is convergent, 


the terms c,zq tend to 0. Thus there is some integer N for which |c,|R” < 1 
when n > N. Fix R’ < R. For |z| < R’ andn > N, we have 


zy|A Rivn 

zl Ge) 

Zo R 

Since }° esi < +00, the Weierstrass M test shows that )°° 5 cnz” converges 
uniformly for |z| < R’. 


For the series pea (n + 1)|en41z"|, the inequalities |z| < R’ andn > N 
together imply 


n 
"= lenl R" 


ra 
len2"| = lenz6l| = 
Z0 


R’\n R’\n 
J+ Vengiz"| 5 @+ ViensilR"(S) <@+yR(). 


To see that the Weierstrass M test applies here as well, choose r’ with R’/R < 


r’ < | and increase the size of N so that ntl < # r’ whenever n > N. For such 


n, the ratio test and the inequality 


(Mt IRR) _ a t2R 
(Wt DRUK) ntl R~ 


show that )> (n + 1)R7'(4)" converges. Thus the Weierstrass M test indeed 
applies, and the proof is complete. 


Corollary 1.38. If )°°° 9 cnx” converges for |x| < R and the sum of the series 
for x real is denoted by f(x), then the function f has derivatives of all orders 
for |x| < R. These derivatives are given by term-by-term differentiation of the 
series for f, and each differentiated series converges for |x| < R. Moreover, 


- £°O) 


rear 
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REMARK. When a function has derivatives of all orders, we say that it is 
infinitely differentiable. 


PROOF. The corollary is vacuous unless R > 0. Let R’ < R. The given 
series certainly converges at x = 0, and Theorem 1.37 shows that the term-by- 
term differentiated series converges uniformly for |x| < R’. Thus Theorem 1.23 
gives f’(x) = ys (n + 1)cn41x” for |x| < R’. Since R’ < R is arbitrary, 
f'R@) = 9 @ + Den gx" for |x| < R. 

We can iterate this result to obtain the corresponding conclusion for the higher- 
order derivatives. Evaluating the derivatives at 0, we obtain f (0) = cxk!, as 
asserted. 


Corollary 1.39. If °° 9 cnx” and )-°° 4 dix" both converge for |x| < R and 
if their sums are equal for x real with |x| < R,thenc, = d, for all n. 


PROOF. This result is immediate from the formula for the coefficients in 
Corollary 1.38. 


If f : R — C is infinitely differentiable near x = a, we call the infinite 
oo (n) 
series )> pe 
n=0 n} 
yg Cn(x — a)" a power series about x = a; its behavior at x = a + t is the 
same as the behavior of the series )°°°,c,x”" at x = t. In applications, one 
usually adjusts the function f so that Taylor series expansions are about x = 0. 
Thus we shall concentrate largely on power series expansions about x = 0. 
Had we chosen at the end of Section 4 to define log x as fi t—' dt and exp x as 


(x — a)" the (infinite) Taylor series of f. We call a series 


the inverse function of log x, we would have found right away that (2) expx = 
exp x forall k. Therefore the infinite Taylor series expansion of exp x about x = 0 
is oy x, This fact does not, however, tell us whether exp x is the sum of this 
series. For this purpose we need to examine the remainder. Theorem 1.36 shows 


that the remainder after the term x” /n! is 
1 f* 1 f* 
R,(0, x) = “al (x —1)" f YO dt = “al (x —t)"e' dt. 
n! 0 n! 0 


Between 0 and x, e’ is bounded by some constant M (x) depending on x, and thus 
[Rn 0, x)| < “| fy @—n" dt| = raat |x|"+!. With x fixed, this tends to 0 as 
n tends to infinity, and thus lim, R, (0, x) = 0 for each x. The conclusion is that 
pea x . Ina similar fashion one can obtain power series expansions of 
sin x and cos x if one starts from definitions of the corresponding inverse functions 
in terms of Riemann integrals. 

Instead of using this approach, we shall define exp x, sinx, and cos x directly 


as sums of standard power series. An advantage of using series in the definitions 
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is that this approach allows us to define these functions at an arbitrary complex 
Z, not just at a real x. Thus we define 


oO on oo n,2n+1 oo n,2n 
Z (—1)%2 (-1)"Z 

expz = ) —, sinz = ) ———_——__, cosz = ) ———— 
<n! < (2n+ 1)! <~  (2n)! 


The ratio test shows immediately that these series all converge for all complex z. 
Inspection of all these series gives us the identity 


expiz =cosz+isinz. 


Corollary 1.38 shows that the functions exp z, sin z, and cos z, when considered 
as functions of a real variable z = x, are infinitely differentiable with derivatives 
given by the expected formulas 


— expx = expx, — sinx =cosx, — cosx = —sinx. 
dx dx dx 


From these formulas it is immediate that 7 ( sin? x + cos? x) = 0 for all x. 


Therefore sin” x + cos? x is constant. Putting x = 0 shows that the constant is 1. 
Thus 


2 


sin? x + cos*x = 1. 


In order to prove that exp x = e*, where e = exp 1, and to prove other familiar 
trigonometric identities, we shall do some calculations with power series that are 
justified by the following theorem. 


Theorem 1.40. If f(z) = \0-29 anz” and g(z) = )o9 bnz” for complex z 
with |z| < R, then f(z)g(z) = 7n20 nz" for |z| < R, where 


Cn = Anbo + An—1b, + +++ + dobn. 


REMARK. In other words, the rule is to multiply the series formally, assuming 
a kind of infinite distributive law, and reassemble the series by grouping terms 
with like powers of z. The coefficient c, of z” in the product comes from all 
products ayzkb)z! for which the total degree is n, i.e., for which k +] =n. Thus 
Cy iS as indicated. 


PROOF. The theorem is vacuous unless R > 0. Fix R’ < R. For |z| < R’, 
put F(z) = 09 |anz"| and G(z) = 24 |bnz”|. These series are uniformly 
convergent for |z| < R’ by Theorem 1.37, and also | f(z)| < F(z) and |g(z)| < 
G(z). By the uniform convergence of the series for F and G when |z| < R’, 
there exists M < +00 such that F(z) < M and G(z) < M for |z| < R’. Given 
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€ > 0, choose an integer N’ such that |z| < R’ implies >>, y, |anz”| < € and 
ener lbn2"| < €. If |z| < R’ and N > 2N’, then 


|f@g@) => 5 cya" < |f@g@) =) (So anz") (Sb42") 
n=0 = oar 
~ IS anz") (3 b,2") — So enz | 


Call the two terms on the right side 7; and T). Then we have 


N 
8(2) — > by 2” 
n=0 


and also, with [NV /2] denoting the greatest integer in N/2, 


T= | oS axizt"!| < ae laxz* |\biz' | 


N 
1s fo 7 Soran’ 


n=0 


N 
Ig(z)-+ | anc” < G(z) +€F (2), 
n=0 


Since G(z) < M and F(z) < M for |z| < R’, the total estimate is that T, + 7) < 
4eM. Since « is arbitrary, we conclude that limy ae nz" = f (zZ)g(z) for 
|z| < R’. Since R’ is an arbitrary number < R, we conclude that }°” 9 ¢az” = 
f (z)g(z) for |z| < R. 


Corollary 1.41. For any z and w inC,exp(z+w) = expz exp w. Furthermore, 
exp Z = exp z. 
PROOF. Theorem 1.40 and the infinite radius of convergence allow us to write 


oO Lr in) VT gyyS 
expzexpu=(S2)()) =n 


r=0 
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For the second formula, write z = x + iy. Then 


exp Z = exp(x — iy) = expx exp(—iy) = (expx)(cos y —isiny) 


= (expx)(cos y +i sin y) = expx exp(iy) = exp(x +iy) = expz. 


Corollary 1.42. The exponential function exp x , as a function of a real variable, 
has the following properties: 


(a) exp is strictly increasing on (—oo, +00) and is one-one onto (0, +00), 

(b) exp x = e*, where e = exp 1, 

(c) expx has an inverse function, denoted by log x, that is strictly increas- 
ing, carries (0, +00) one-one onto (—0oo, +00), has derivative 1/x, and 
satisfies log(xy) = logx + logy. 


REMARKS. The three facts that expx = e* for x real, expz satisfies the 
functional equation of Corollary 1.41 for z complex, and e¢ is previously undefined 
for z nonreal allow one to define e* to mean exp z for all complex z. We follow 
this convention. In particular, e’* = exp(ix) = cosx + isinx. 


PROOF. For x > 0, we certainly have expx > 1. Also, each term of the 
series for exp x is strictly increasing for x > O, and hence the same thing is true 
of the sum of the series. From Corollary 1.41, exp(—x)expx = exp0O = 1, 
and thus exp x is strictly increasing for x < 0 with O < expx < 1. Putting 
these statements together, we see that exp x is strictly increasing and positive on 
(—oo, +00). Hence it is one-one. This proves part of (a). 

Since exp x > O, it makes sense to consider rational powers of exp x. Iteration 
of the identity exp(z + w) = expzexpw shows that (exp oh = exppx = 
(expx)?, and application of the g™ root function gives exp = = (exp x)?/4, 
Taking x = 1 yields exp(p/q) = e?/“ for all rational p/q. The two functions 
exp x and e* are continuous functions of a real variable that are equal when x is 
rational, and hence they are equal for all x. This proves (b). 

From the first two terms of the series for exp 1, we see that e > 2. Therefore 
e” > 2” > n for all positive integers n, and exp x has arbitrarily large numbers 
in its image. The Intermediate Value Theorem (Theorem 1.12) then shows that 
[0, +00) is contained in its image. Since exp(—x) exp x = 1, the interval (0, 1] 
is contained in the image as well. Thus exp x carries (—oo, +00) onto (0, +00). 
This proves the remainder of (a). 

Consequently exp x has an inverse function, which is denoted by log x. Since 
exp x has the continuous everywhere-positive derivative exp x , the proposition in 
Section A3 of the appendix applies and shows that log x is differentiable with 
derivative 1 / exp(log x). Since exp and log are inverse functions, exp(log x) = x. 
Thus the derivative of log x is 1/x. 
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Finally exp(log x + log y) = exp(log x) exp(log y) = xy, since exp and log 
are inverse functions. Applying log to both sides gives log x + log y = log(xy). 
This proves (c). 


Corollary 1.43. The trigonometric functions sin x and cos x, as functions of 
a real variable, satisfy 
(a) sin(x + y) = sinx cos y + cosx sin y, 
(b) cos(x + y) =cosx cos y — sinx siny. 


Proor. By Corollary 1.41, cos(x + y) +isin(x + y) = e@+») = eel? = 
(cos x +i sinx)(cos y +i sin y). Multiplying out the right side and equating real 
and imaginary parts yields the corollary. 


The final step in the foundational work with the trigonometric functions is to 
define z and to establish the role that it plays with trigonometric functions. 


Proposition 1.44. The function cos x, with x real, has a smallest positive xo 
for which cos xp = 0. If z is defined by writing x9 = 2/2, then 
(a) sinx is strictly increasing, hence one-one, from [0, 5] onto [0, 1], and 
cos x is strictly decreasing, hence one-one, from [0, 5] onto [0, 1], 


(b) sin(—x) = — sinx and cos(—x) = cosx, 

(c) sin(x + +) =cos x and cos(x + >) = —sinx, 
(d) sin(x + a) = —sinx and cos(x +7) = —cosx, 
(e) sin(x + 27) = sinx and cos(x + 277) = cosx. 


PROOF. The function cos x has cosO = 1. Arguing by contradiction, suppose 
that cos x is nowhere 0 for x > 0. By the Intermediate Value Theorem (Theorem 
1.12),cosx > 0 for x > 0. Since sin x is 0 at O and has derivative cos x, sin x is 
strictly increasing for x > 0 and is therefore positive for x > 0. Since cos x has 
derivative — sin x, cos x is strictly decreasing for x > 0. Let us form the function 
f (x) = (cos x —cos 1) + (sin 1)(x — 1). If there is some x; > 1 with f(x;) > 0, 
then the Mean Value Theorem produces some & with 1 < € < x, such that 


0< fa) = f@)- fd) = —- DFE) = G1 — VD sing + sin 1) <0, 


and we have acontradiction. Thus f(x) < Oforallx > 1. Inotherwords,cosx < 
cos 1 — (sin 1)(x — 1) forall x > 1. For x sufficiently large, cos 1 — (sin 1)(x — 1) 
is negative, and we see that cos x has to be negative for x sufficiently large. The 
result is a contradiction, and we conclude that cos x is 0 for some x > O. Let 
Xo be the infimum of the nonempty set of positive x’s for which cos x = 0. We 
can find a sequence {x,} with x, — xo and cos x, = 0 for all n. By continuity 
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cos xg = 0. We know that x9 > 0, and we must have xo > O, since cosO = 1. 
This proves the existence of xo. 

Since sinx has derivative cos x, which is positive for 0 < x < 2/2, sinx is 
strictly increasing forO < x < 2/2. From sin? x + cos?.x = 1, we deduce that 
sin(z/2) = 1. By the Intermediate Value Theorem, sin x is one-one from [0, oa 
onto [0, 1]. In similar fashion, cos x is strictly decreasing and one-one from [0, 5] 
onto [0, 1]. This proves (a). 

Conclusion (b) is immediate from the series expansions of sinx and cos x. 
Conclusion (c) follows from Corollary 1.43 and the facts that sin = 1 and 
cos 5 = 0. Conclusion (d) follows by applying (c) twice, and conclusion (e) 
follows by applying (d) twice. 


Corollary 1.45. The function e’*, with x real, has |e’*| = 1 for all x, and 
x t+ e!* is one-one from [0, 27) onto the unit circle of C, i.e., the subset of 
z €C with |z| = 1. 


PROOF. We have |cosx +isinx|? = cos?x + sin?x = 1 and therefore 
je*| = 1. If e*! = e!” with x, and x> in [0, 27), then e!'-) = 1 with 
t = xX; — X2 in (—2z, 27). So cost = 1 and sint = 0. From Proposition 1.44 
we see that the only possibility for tf € (—27, 27) ist = 0. Thus x; — x2 = 0, 
and x +> e’* is one-one. 

Now let x + iy have x? + y? = 1. First suppose that x > 0 and y > 0. Since 
0 < y < 1, it follows that there exists t € [0, 5] with sint = y. For this f, 
the numbers x and cost are both > 0 and have square equal to 1 — y*. Thus 
x = cost and e’’ = x +iy. Fora general x + iy with x? + y* = 1, exactly 
one of the complex numbers i”(x + iy) with 0 < n < 3 has real and imaginary 
parts > 0. Then i"(x + iy) = e"' for some f. Since i = cos 3 + isin 5 = e'”/?, 


we see that x + iy = e!e7i"7/? = elt-in7/2, From e! #27) — e!*, we can adjust 
it — ina /2 additively by a multiple of 277i so that the result ir’ lies in i[0, 277), 
and then e‘” = x + iy, as required. 


Corollary 1.46. 


(a) The function sin x carries (—5, +) onto (—1, 1), has everywhere-positive 
derivative, and has a differentiable inverse function arcsinx carrying (—1, 1) 
one-one onto (— 5, 5). The derivative of arcsin x is 1/V1 — x?, 

(b) The function tanx = (sinx)/cos x) carries (—>, 5) onto (—oo, +00), 
has everywhere-positive derivative, and has a differentiable inverse function 


arctan x carrying (—0o, +00) one-one onto (— +, 5). The derivative of arctan x is 
1/(1+x2),and f!, 1 +x?)"!dx = 7/2. 


PROOF. From Proposition 1.44 we see that 4 (sin x) = cos x and # (tan x)= 
(cosx)~?. The first of these is everywhere positive because of (a) and (b) 
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in the proposition, and the second is everywhere positive by inspection. 
The image of sinx is (—1,1) by (a) and (b), and also the image of tanx is 
(—oo, +00) by (a) and (b). Application of the proposition in Section A3 of 
the appendix yields all the conclusions of the corollary except the formula for 
ie (1 + x)! dx. This integral is given by arctan | — arctan(—1) by Theo- 
rem 1.32. Since tan(z/4) = sin(z/4)/cos(z/4), (c) in the proposition gives 


tan(z/4) = 1, and hence arctan 1 = 7/4. In addition, tan(—z/4) = a = 


— Re = —1,and hence arctan(—1) = —7/4. Therefore sien (1+x7)"!dx = 
(1/4) — (—7/4) = 77/2. 


A power series, even a Taylor series, may have any radius of convergence in 
[0, +00]. Even if the radius of convergence is > 0, the series may not converge 
to the given function. For example, Problems 20-22 at the end of the chapter ask 
one to verify that the function 


ex ify £0, 


oS, ipso 


is infinitely differentiable, even at x = 0, and has f (0) = 0 for all n. Thus its 
infinite Taylor series is identically 0, and the series evidently converges to f (x) 
only for x = 0. 

Because of Corollary 1.38, one is not restricted to a rote use of Taylor’s formula 
in order to compute Taylor series. If we are interested in the Taylor expansion 
of f about x = 0, any power series with a positive radius of convergence that 
converges to f on some open interval about x has to be the Taylor expansion of 
f. Asimple example is e* , whose derivatives at x = 0 are a chore to compute. 


i oo u" oo yn 
However, e“ = ipo G n=0 nt 


for all u. If we put u = x2, we obtain e* = 
for all x. Therefore this series must be the infinite Taylor series of e* . Here isa 


more complicated example. 


EXAMPLE. Binomial series. Let p be any complex number, and put F(x) = 
(1+.x)? for —1 <x < 1. Wecan compute the n™ derivative of f by inspection, 
and we obtain F(x) = p(p — 1)---(p —n+ 1)(1 + x)?~". Therefore the 
infinite Taylor series of F about x = 0 is 
5 ee 


n! 


n=0 


This series reduces to a polynomial if p is a nonnegative integer, and the series 
is genuinely infinite otherwise. The ratio test shows that the series converges for 
|x| < 1; let f(x) be its sum for x real. The remainder term R,, (0, x) is difficult to 
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estimate, and thus the relationship between the sum / (x) and the original function 
(1 + x)? is not immediately apparent. However, we can use Corollary 1.38 to 
obtain 


(ej) a ES 2 eee 


n! n! 


n=1 n=0 


for |x| < 1. We compute (1 + x) f’(x) by multiplying the first series by x, the 
second series by 1, and adding. If we write the constant term separately, the result 
is 


a 5 Pee n+ l)[n+(p—n)] 


* x" = pf (x). 


(+x) f"@) = 


n=1 


Therefore 


d 
oe [da +x) ?f(@)] =—pdt+x)?'f@)+d+x)? f(x) 
= (1 +x)? '[-pf (x) + 1 +x) f'(x)] = 0, 


and (1 + x)~? f(x) has to be constant for |x| < 1. From the series whose sum is 
f (x), we see that f(0) = 1, and hence the constant is 1. Thus f(x) = (1+ x)?, 
and we have established the binomial series expansion 


(tara PPaD Pant) 


! 
W=0 nN: 


for -l <x <1. 
8. Summability 


Summability refers to an operation on a sequence of complex numbers to make it 
more likely that the sequence will converge. The subject is of interest particularly 
with Fourier series, where the ordinary partial sums may not converge even at 
points where the given function is continuous. 

Let {5,}n>0 be a sequence in C, and define its sequence of Cesaro sums, or 
arithmetic means, to be given by 


Ps — SOTST Tt Sn 
and n+1 


forn > 0. If lim, o, = o exists in C, we say that {s,,} is Cesaro summable to the 
limit o. For example the sequence with s, = (—1)” for n > 0 is not convergent, 
but it is Cesaro summable to the limit 0 because o,, is 0 for all odd n and is =I 
for all even n. 
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Theorem 1.47. If a complex sequence {s,},>0 is convergent in C to the limit 
s, then {s,} is Cesaro summable to the limit s. 


REMARK. The argument is a 2€ proof, and two things are affecting o,,. For k 
small and fixed, the contribution of s, to o, is s,/(m + 1) and is tending to 0. For 
k large, any s;, is close to s, and the average of such terms is close to s. 


PROOF. Let € > 0 be given, and choose N; such thatk > N, implies |s,—s| < 
€. Ifn > Nj, then 


Goatees in, 8) Owitis) ete Gn = 8) 
On s= T 3 
n+1 n+1 
so that 
Iso] +--+ +]5y,J + (14+ Dis] n-M 
lon — S| S € 
n+1 n+1 
7 lso] +--+ + ]5n,) + (M1 + Ds] be 
=a n+1 


The numerator of the first term is fixed, and thus we can choose N > N, large 
enough so that the first term is < € whenever n > N. Ifn > N, then we see that 
|o, — S| < 2e. Since € is arbitrary, the theorem follows. 


Next let {a,},>9 be a complex sequence, and let {s,},>9 be the sequence of 
partial sums with s, = )~¢_9 a,. Form the power series o, = )-°°) dar". We 
say that the sequence {s,} of partial sums is Abel summable to the limit s in C 
if lim,+; 0, = s,i.., if for each € > O, there is some rp such thatrp <r < 1 
implies that |o, — s| < €. For example, take a, = (—1)*, so that s, equals 1 ifn 
is even and equals 0 if n is odd. The sequence {s,} of partial sums is divergent. 
The r'" Abel sum o, is given by the geometric series )°?° 9 (—1)*r* with sum 
1/(+r). Letting r increase to 1, we see that {s,,} is Abel summable with limit s. 


Theorem 1.48 (Abel’s Theorem). Let {a,},>0 be a complex sequence, and 
let {sn}n>0 be the sequence of partial sums with s, = ear ax. If {Sn}n>o is 
convergent in C to the limit s, then {s,,} is Abel summable to the limit s. 


REMARK. The proof will proceed along the same lines as in the previous case. 
It is first necessary to express the Abel sums o, in terms of the s;’s. 


PROOF. Since {s,} converges, {s,} and {a,} are bounded, and thus 5 S_r” 
and bere ayr* are absolutely convergent for0 <r < 1. With s_; = 0, write 


oe) oe) oe) 
Oo; = ) ayr” = ) (Sn — Sp-1)r" = ) Spr” — ) wae 
n=0 n=0 n=0 


oo N oo 
=(L-r) ) i sar" =(L—r) Do rtse + D0 -r)r'se. 
n=0 k=0 


k=N+1 
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Let € > 0 be given, and choose N such that k > N implies |s; — s| < €. Then 


N lee) 
lo —s|Sd=r) > dsdlt+isd+ >) G=r)r*lx=s! 

k=0 k=N+1 
N ee) 

< (=r) dsl +is)+(G-7) Do re 
k=0 k=N+1 
N 

<(1—r) >) (sl + |s)) +6. 
k=0 


With N fixed, the coefficient of (1 — r) in the first term is fixed, and thus we can 
choose rp close enough to | so that the first term is < € wheneverrp <r < 1. If 
ro <r <1, we see that |o, — s| < 2€. Since € is arbitrary, the theorem follows. 


EXAMPLE. For |x| < 1, the geometric series }°>°9(—1)"x” converges and 
has sum (1 + x)~!. The Fundamental Theorem of Calculus gives log(1 + t) = 
to os dt = |} oo (— "2" dt for |x| < 1, and Theorem 1.31 allows us to 
interchange sum and integral as long as |x| < 1. Consequently 


oo (<1)! 


log(1 +x) = >> 
4 +1 


for |x| < 1. The sequence of partial sums on the right converges for x = 1 by 
the Leibniz test, and Theorem 1.48 says that the Abel sums must converge to 
the same limit. But the Abel sums have limit lim,,; log(1 + x) = log 2, since 
log(1+ x) is continuous for x > 0. Thus Abel’s Theorem has given us a rigorous 
proof of the familiar identity 


s ey = log?. 


3 
+ 


Theorems 1.47 and 1.48, which say that one kind of convergence always 
implies another, are called Abelian theorems. Converse results, saying that the 
second kind of convergence implies the first under an additional hypothesis, are 
called Tauberian theorems. These tend to be harder to prove. We give two 
examples of Tauberian theorems; the first one will be applied immediately to 
yield an important special case of the main theorem of Section 9; the second one 
will be used in Chapter VI to prove a deep theorem about pointwise convergence 
of Fourier series. 
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Proposition 1.49. Let {a,},>0 be a complex sequence with all terms > 0, and 
let {Sn}n>0 be the sequence of partial sums with s, = Siar ax. If {Sn}n>o is Abel 
summable in C to the limit s, then {s,} is convergent to the limit s. 


PROOF. Let {r;}j>0 be a sequence increasing to the limit 1. Since Ant} > Ois 
nonnegative and since it is monotone increasing in j for each n, Corollary 1.14 
applies and gives limj 77-9 dn? = D709 lim; dnr7, the limits existing in R*. 
The left side is the (finite) limit s of the Abel sums, and the right side is lims,, 
which Corollary 1.14 is asserting exists. 


EXAMPLE. The binomial series expansion in Section 7 shows, for any complex 
p, that (1 —r)? is given for —1 <r < 1 by the absolutely convergent series 


oe —1)---(p— 1 

A n!} 
For p real with 0 < p < 1, inspection shows that all the coefficients in the sum 
on the right are < 0. Therefore 


oo co Oe 1 
1d anr a ent PRT ner we 


n=1 
has all coefficients > 0 if 0 < p < 1. For0O <r < 1, the sum of the series is 
1 —(1—r)? and is > 0. The fact that lim,,; [1 — (1 —r)?] = 1 means that the 
sequence of partial sums 5, = S772, (—1)k+! PPD“ EF ig Abel summable 
to 1. Proposition 1.49 shows that the series (*) is convergent at r = 1, and 
the Weierstrass M test shows that (*) converges uniformly for —1 < r < 1 to 
1—(1—r)?. If we now take p = +, we have 


d —r)l/2 4 5 yt! 5(—4)(—3) ee G —n) s 


n! 


n=1 


n! 


n=1 


St yrt LEPED Gn), 


n! 


1 


n! 


eo) Ie_1ly_ 3)... 3 
=>\( pyre 2 3-35) G —N) (—r"), 
n=1 


the series on the right being uniformly convergent for —1 < r < 1. Putting 
r = 1 — x? therefore gives 


oo I~_ly(_3)... (3 — 
x} = Vx? = > syne 2 z)(=9) GG - (1-—x%)"), 
n=1 


n! 
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the series on the right being uniformly convergent for —1 < x < 1. Consequently 
|x| is the uniform limit of a sequence of polynomials on [—1, 1], and all these 
polynomials are in fact 0 at x = 0. 


Proposition 1.50. Let {a,},>9 be a complex sequence, and let {s,},>9 be the 
sequence of partial sums with s, = ee ax. If {s;} is Cesaro summable to the 
limit s in C and if the sequence {na,} is bounded, then {s,,} is convergent and the 
limit is s. The rate of convergence depends only on the bound for {na,} and the 
rate of convergence of the Cesaro sums. 


REMARK. In our application in Chapter VI to pointwise convergence of Fourier 
series, the sequence of partial sums will be of the form {s,(x)}, depending on a 
parameter x, and the statement about the rate of convergence will enable us to 
see that the convergence of {s,,(«)} is uniform in x under suitable hypotheses. 


PROOF. Let {s,,} be the sequence of partial sums of {a,}, and choose M such 
that |na,| < M for all n. The first step is to establish a useful formula for 
Sy — O,. Let m be any integer with O < m <n. We start from the trivial identity 
—(n—mM)on = (m+ 1)on — (n+ 1)on, add (n — m)5y to both sides, and regroup 
as 


(n—m)(Sp a On) = (m l)on SO aaa Sm (n M)Sp Sm+1 es Sn 


= (m+1)@n—om)+ >) Ga—s;). 


j=m+1 
Dividing by (n — m) yields 
m+1 = 
Sn On = A= ah (On Om) = +s (Sn Sj) 


which is the identity from which the estimates begin. 
Form+1 <j <n, we have 


M M M 
Ion — 371 Sl + laid 2+ + lagaal St pt 
2 Mis on MS _ mM _ ™—j)M _(a=m—1)M 
sie ee er oh ee ey | oy ey, 
Substituting into our identity yields 
m+1 (n—m—1)M 
Sn — On| < lon On| 


n—-m m+2 
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Let € > 0 be given, and choose N such that |o, — s| < €? whenever k > N. 
We may assume that € < 5 and N > 4. With € fixed and with n fixed to be 
> 2N, define m to be the unique integer with 


n—e 


l+e 


m< <m+l. 

ThenO < m <n, and our inequality for |s, —o,| applies. From the left inequality 
m < ie defining m, we obtain m + me <n—e andhence (m+ lhe <n-—m 
mt} < e—!. From the right inequality Tre < m+ 1 defining m, we obtain 
n—-€<m+1-+em-+e and hence n —m—1 < €(m-+ 2) and 


Thus our main inequality becomes 


and 
n—m—1 
m+2 


Se, 


ISn _ On| s tle _ On| a Me. 


To handle o,,, we need to bound m below. We have seen that n —m—1 < 
€(m + 2), and we have assumed that € < ‘. Thenn —m-—1 < $(m + 2), and 
this simplifies tom > on — , which is > 4 ifn > 8, thus certainly if N > 4. In 
other words, N > 4 andn > 2N makes m > a > N. Therefore |on —s| < €?, 


and |o, — 6m| < 2e7. Substituting into our main inequality, we obtain 
|Sn — On| < e267 + Me = (M 4 2)e. 


Since € is arbitrary, the proof is complete. 


9. Weierstrass Approximation Theorem 


We saw as an application of Proposition 1.49 that the function |x| on [—1, 1] is the 
uniform limit of an explicit sequence {P,} of polynomials with P, (0) = 0. This 
is a special case of a theorem of Weierstrass that any continuous complex-valued 
function on a bounded interval is the uniform limit of polynomials on the interval. 

The device for proving the Weierstrass theorem for a general continuous 
complex-valued function is to construct the approximating polynomials as the 
result of a smoothing process, known as the use of an “approximate identity.” 
The idea of an approximate identity is an important one in analysis and will occur 
several times in this book. If f is the given function, the smoothing is achieved 
by “convolution” 


[re — tg t) dt 


of f with some function ¢, the integrals being taken over some particular intervals. 
The resulting function of x from the convolution turns out to be as “smooth” as 
the smoother of f and gy. In the case of the Weierstrass theorem, the function 
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g will be a polynomial, and we shall arrange parameters so that the convolution 
will automatically be a polynomial. 

To see how a polynomial { f(x — t)g(t) dt might approximate f, one can 
think of g as some kind of mass distribution; the mass is all nonnegative if 
gy => 0. The integration produces a function of x that is the “average” of translates 
xt» f(x—t) of f, the average being computed according to the mass distribution 
y. If g has total mass 1, i.e., total integral 1, and most of the mass is concentrated 
near t = 0, then f is being replaced essentially by an average of its translates, 
most of the translates being rather close to f, and we can expect the result to be 
close to f. 

For the Weierstrass theorem, we use a single starting g at stage 1, namely 
c1(1 — x*) on [—1, 1] with c; chosen so that the total integral is 1. The graph of 
gy, is a familiar inverted parabola, with the appearance of a bump centered at the 
origin. The function at stage n is c,(1 — x7)", with c, chosen so that the total 
integral is 1. Graphs for n = 3 and n = 30 appear in Figure 1.1. The bump near 
the origin appears to be more pronounced at n increases, and what we need to do 
is to translate the above motivation into a proof. 


Lemma 1.51. If c, is chosen so that c, tie (1 —x?*)"dx =1,thenc, < e/n 
for n sufficiently large. 


PROOF. We have 


co! =f! —x2)"dx > ee (l—x2)dx =2 [iV 1 — x2)" dx 


> 2 0" d= "ax =20-—4*/ yn. 
Since (1 — ty — e7!, we have (1 — ty > se! for n large enough (actually 


for n > 2). Therefore c,' > e~!/,/n for n large enough, and hence c, < e/n 
for n large enough. This proves the lemma. 


n=3 n = 30 
1 
0.8 
0.6 
0.4 
0.2 
=1 =0.5 0.5 1 =1 0.5 1 


FIGURE 1.1. Approximate identity. Graphs of c, ic (1 — x7)" dx for 
for n = 3 andn = 30 with different scales used on the vertical axes. 
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Let @n(x) = ¢,(1 —x’)" on [—1, 1], with c, as in the lemma. The polynomials 
@» have the following properties: 


(i) Gn(x) = 0, 
Gi) f', gna) dx = 1, 
(ii) for any 6 > 0, sup {g@,(x) | 6 < x < 1} tends to 0 as n tends to infinity. 


Lemma 1.51 is used to verify (iii): the quantity 
sup {fn(x) | 6 <x <1} =en(1 — 8°)" 


tends to 0 because lim, /n(1 — 6°)" = 0. A function with the above three 
properties will be called an approximate identity on [—1, 1]. 


Theorem 1.52 (Weierstrass Approximation Theorem). Any complex-valued 
continuous function on a bounded interval [a, b] is the uniform limit of a sequence 
of polynomials. 


PROOF. In order to arrange for the convolution to be a polynomial, we need 
to make some preliminary normalizations. Approximating f(x) on [a, b] by 
P(x) uniformly within € is the same as approximating f(x + a) on [0, b — a] 
by P(x + a) uniformly within €, and approximating g(x) on [0, c] uniformly by 
Q(x) is the same as approximating g(cx) uniformly by Q(cx). Thus we may 
assume without loss of generality that [a, b] = [0, 1]. 

If h : [0,1] — C is continuous and if r is the function defined by r(x) = 
h(x) — h(O) — [h(1) — h(O)]x, then r is continuous with r(0) = r(1) = O. 
Approximating h(x) on [0, 1] uniformly by R(x) is the same as approximating 
r(x) on [0, 1] uniformly by R(x) — h(O) — [h(1) —h(O)]x. Thus we may assume 
without loss of generality that the function to be approximated has value 0 at 0 
and 1. 

Let f : [0, 1] ~ Cbea given continuous function with f (0) = f (1) = 0; the 
function f is uniformly continuous by Theorem 1.10. We extend f to the whole 
line by making it be 0 outside [0, 1], and the uniform continuity is maintained. 


Now let ¢, be the polynomial above, and put P, (x) = if f(x — t)g,(t) dt. 
Let us see that P, is a polynomial. By our definition of the extended f, the 
integrand is 0 for a particular x € [0, 1] unless ¢ is in [x — 1, x] as well as [—1, 1]. 
We change variables, replacing ¢t by s + x and making use of Theorem 1.34, and 
the integral becomes P,,(x) = f f(s) @n(s + x) ds, the integral being taken for 
sin[—1, 0]Q[—1—.x, 1— x]. Since x is in [0, 1], the condition on s is that s is in 
[—1, 0]. Thus P,, (x) = ie f(©s)@n(s +x) ds. In this integral, g, (x) is a linear 


combination of monomials x*, and x* itself contributes ‘ie f(—s)(x + sds, 
which expands out to be a polynomial in x. Thus P, (x) is a polynomial in x. 
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By property (i1) of g,, we have 


1 1 
Cen ee / _f(e=en(t) dt ~ 0) = / Lf G—1)— FO) galt) at 


Then property (i) gives 


1 


IPa(x) — FW) < i Ife —1) — f@)lon(t) dt 


3 -5 pl 
= iP If) — FO) en) dt + (J +f )IF@—1 — FOI] oO dr, 
and two further uses of property (ii) show that this is 


< sup [fee =) — fool +4 sup 1fOr1)(_ sup. nt). 
ye, 


|t|<6 6<|t|<1 


Given € > 0, we choose some 6 of uniform continuity for f and €, and then the 
first term is < €. With 6 fixed, we use property (iii) of g, and the boundedness 
of f, given by Theorem 1.11, to produce an integer N such that the second term 
is < € forn > N. Thenn > N implies that the displayed expression is < 2e. 
Since € is arbitrary, P, converges uniformly to f. 


10. Fourier Series 


A trigonometric series is a series of the form )°°. _., c,e’"* with complex coef- 
ficients. The individual terms of the series thus form a doubly infinite sequence, 
but the sequence of partial sums is always understood to be the sequence {sy }%)_ 


with sy (x) = x cne'"*. Such a series may also be written as 
ao = 
= + > (ay, cosnx + by, sinnx) 
n=1 


by putting 
e'"* = cosnx +i sinnx 
forn > 0, 


e'"* = cosnx —isinnx 


c= 5d, C= 5 (An —ib,), and c_,»= 5 (an +ib,) forn>0. 


Historically the notation with the a,,’s and b,’s was introduced first, but the use of 
complex exponentials has become quite common. Nowadays the notation with 
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a,’s and b,’s tends to be used only when a function f under investigation is 
real-valued or when all the cosine terms are absent (i.e., f is even) or all the sine 
terms are absent (i.e., f is odd). 

Power series enable us to enlarge our repertory of explicit functions, and the 
same thing is true of trigonometric series. Just as the coefficients of a power 
series whose sum is a function f have to be those arising from Taylor’s formula 
for f, the coefficients of a trigonometric series formed from a function have to 
arise from specific formulas. Let us run through the relevant formal computation: 
First we observe that the partial sums have to be periodic with period 277. The 
question then is the extent to which a complex-valued periodic function f on the 
real line can be given by a trigonometric series. Suppose that 


f@~= Ss Ger: 


n=—C 


—ikx 


Multiply by e and integrate to get 


1 oe 2 1 TR ae ; ‘ 
— f(xye® dx = — ss Ge eo de, 
Ox an 2h J 9g 
If we can interchange the order of the integration and the infinite sum, e.g., if the 
trigonometric series is uniformly convergent to f (x), the right side is 


TNX 1KRX U(N—K)xX 
= Ch — ee dx = C — e dx = cx 
ys u on Le ut on 


n=—oo “iE n=—0o i 
because ~ : 

iL oimt dy = 1 ifm =0, 

QTC Jog 0 ifm 40. 


Let f be Riemann integrable on [—z, 2], and regard f as periodic on R. The 


trigonometric series )°>°_, cne’™* with 


1 : 
Cn = —— f(xje"™ dx 
2m J_x 


is called the Fourier series of f. We write 
ee) N ; 
f@)~ ye cre and sn(f3x) = > cre. 
n=—0O n=—N 


The numbers c, are the Fourier coefficients of f, and the functions sy (f; x) are 
the partial sums of the Fourier series. The symbol ~ is to be read as “has Fourier 
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series,” nothing more, at least initially. The formulas for the coefficients when 
the Fourier series is written with sines and cosines are 


1 as 

an = — f (x) cos nx dx forn > 0, 
7 
1 us 

b, = — f (x) sinnx dx forn > 1. 
WT J—x 

In applications one encounters periodic functions of periods other than 277. If 
f is periodic of period 2/, then the Fourier series of f is f(x) ~ )7._. cneit™*/! 
with c, = (21)7! is f (xje7i"™/! dx. The formula for the series written with 
sines and cosines is f(x) ~ ao/2+)->~, (dn cos(nax/1)+ bp sin(nzx/1)) with 
an = 17 f", f (x) cos(nx/L) dx and b, = 17! f", f(x) sin(nzx/1) dx. In the 
present section of the text, we shall assume that our periodic functions have period 
20. 

The result implicit in the formal computation above is that if f(x) is the sum 
of a uniformly convergent trigonometric series, then the trigonometric series is 
the Fourier series of f, by Theorem 1.31. 

We ask two questions: When does a general Fourier series converge? If the 
Fourier series converges, to what extent does the sum represent f? We begin 
with an illuminating example that brings together a number of techniques from 
this chapter. 


EXAMPLE. As in the example following Theorem 1.48, we have 


1 
log (——) =x + fx? 4 x a for —l<x <1. 


We would like to extend this identity to complex z with |z| < 1 but do not want to 
attack the problem of making sense out of log as a function of a complex variable. 
What we do is apply exp to both sides and obtain an identity for which both sides 
make sense when the real x is replaced by a complex z: 


1 
exp(et 32° + $2? ++) = 7 for |z| < 1. 


In fact, this identity is valid for z complex with |z| < 1, and Problems 30-35 
at the end of the chapter lead to a proof of it. Corollary 1.45 allows us to write 
z=re! and z+ 427+ 423 +--+ = pe'?. Equating real and imaginary parts of 
the latter equation gives us 


1” cos nd or" sinnd 
cos g = ————— and sing = ————_.. 
pcos y B = psing =) | — 


n=1 
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We shall compute the left sides of these displayed equations in another way. We 
have 


eP SP pip sing _ exp(p cos g + ip sing) = exp(pe'®) = (1 — at 
and therefore also e? °S%e— i"? — (1 — z)~!. Thus 
ep cose — (Jz)! (1-2)! = (1-re!®)-!(1—re®)-! = (1—2r cos 6 +r7)“!. 


Taking log of both sides gives 20 cos g = log (( — 2r cos@ + a mae and thus 


we have “3 
1 r” cosn@ 
1 
71 ( ) > Fi ¢ 
208 1—2rcos@+r2 Ds n w 


Handling p sing is a little harder. From e? °S%e!?si"? — (1 — z)~!, we have 


eipsing — (d= zy yl _ z|7) = (1 Z)/|1 zl = trees? irene and hence 


cos(p sing) = (l—rcos@)/|l—z| and sin(psing) = (rsin@)/|1 — z|. 


Thus tan(p sing) =r sin@/(1—rcos@). Since 1—r cos @ is > 0, cos(p sin ¢) is 
> 0, and p sin g = arctan ((r sin6)/(1 —r cos 0)) +22 N(r, @) for some integer 
Nr, 8) depending onr and 6. Hence 


or" sinnd 
arctan ((r sin @)/(1 —r cos 0)) +2nN(r,0) = —— 
n 


n=1 


For fixed r, the first term on the left is continuous in 6, and the series on the 
right is uniformly convergent by the Weierstrass M test. By Theorem 1.21 the 
right side is continuous in 6. Thus N(r, @) is continuous in @ for fixed r; since 
N(r, 0) =0, N(r, 0) = 0 for all r and 0. We conclude that 


r sin Sr" sinnd 
arctan (——_) = ——_. 40 

1 —rcosé » n oe 
Problem 15 at the end of the chapter observes that the partial sums ae cos né 
and yy sinn@ are uniformly bounded on any sete < 0 < nm —e€ ife > 0. 


Corollary 1.19 therefore shows that the series 
o.@) [o.@) 


cos n@ sinn@ 
; and s z 


n=1 n=1 


are uniformly convergent fore < 6 < m —€ ife > 0. Abel’s Theorem (Theorem 
1.48) shows that each of these series is therefore Abel summable with the same 
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limit. We can tell what the latter limits are from («) and (**), and thus we 
conclude that 


: °°. cos nd 
— 
2 —2cos@ | 
sin 6 oo mm 
mene 
an arctan (-— = ae 


n=1 

The sum of the series with the cosine terms is unbounded near 6 = O, and 
Riemann integration is not meaningful with it. We shall not be able to analyze 
this series further until we can treat the left side in Chapter VI by means of 
Lebesgue integration. The sum of the series with the sine terms is written in a 
way that stresses its periodicity. On the interval [—z, 7], we can rewrite its left 
side as $(—2 — 0) for —z < 60 < 0,0 for 6 = 0, and 5 (x —6)for0 <60<7Z. 
The expression for the left side is nicer on the interval (0, 277), and there we have 


love) . 
1 sinnd 
5(7 —O)= ) 7 for0 <6 < 2z. 


n=1 


The function 5 (a — 6) is bounded on (0, 27), — we can readily compute its 


Fourier coefficients from the formula b, = m7 Ne 5 (a — 0)sinné dé, using 


integration by parts (Corollary 1.33). The result is that b, = 1/n. Hence the 
displayed series is the Fourier series. Graphs of some of the partial sums appear 
in Figure 1.2. 


FIGURE 1.2. Fourier series of sawtooth function. Graphs of 
™_, (sinnx)/n for n = 3,5, 10, 30. 
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The sawtooth function in the above example has a discontinuity, and yet its 
Fourier series converges to it pointwise. The recognition of the remarkable 
potential that Fourier series have for representing discontinuous functions dates 
to Joseph Fourier himself and caused many of Fourier’s contemporaries to doubt 
the validity of his work. 

Although the above Fourier series converges to the function, it cannot do so 
uniformly, as a consequence of Theorem 1.21. In any such situation the Fourier 
coefficients cannot decrease rapidly, and a decrease of order 1/n is the best that 
one gets for a nice function with a jump discontinuity. 

This example points to a general heuristic principle contrasting how power 
series and trigonometric series behave: whereas Taylor series converge very 
rapidly and may not converge to the function, Fourier series are inclined to 
converge rather slowly and they are more likely to converge to the function. 

We come to convergence results in a moment. First we establish some ele- 
mentary properties of them. Taking the absolute value of c, in the definition of 


Fourier coefficient, we obtain the trivial bound |c,| < x ie | f (x)| dx. 


Theorem 1.53. Let f be in R[—z, 2]. Among all choices of d_y,...,dy, 


the expression 
‘ss 


2 
dx 
20 


N . 
f@- >) de 


—t n=—N 


is minimized uniquely by choosing d,, for all n with |n| < N, to be the Fourier 
coefficient c, = + fe f(x)e"'™ dx. The minimum value is 


At peytaz— SD heal’ 
20 J_x Ny 
PROOF. Put d, = Cy + €,. Then 
I, |F@)— Dey dee)? 
= f™ (fi Pdx — L2Re DN yd J, foe ™ dx 
7 7 ie yea Edna” dx 
Sf lf G)/ de —2Re yy Gada ye eal” 
= (4 S% IF @)P dx) — 2M y lenl? + 2Re ON y ents) 
+ (Diy lel? +2Re DM y ote + ON lenl?) 


=¢ f™, FG)? — Vow leal? + Ow lenl?. 


The result follows. 


dx 
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Corollary 1.54 (Bessel’s inequality). Let f be in R[—z, mz], and let f(x) ~ 
Ge Then 


CO 1 IU 
DY bP ssf ifeyPas. 
n=—Co 20 ir 
In particular, )7°°_. |cn|? is finite. 


REMARK. In terms of the coefficients a, and b,, the corresponding result is 


lanl? cy 
STD (lanl? + bal?) ~ | f(a)? dx. 


a 


n= 


PROOF. The theorem shows that the minimum value of a certain nonnegative 
quantity depending on N is x ne f(x) |? dx — ee |cn|°. Thus, for any N, 
yy lenl? < +f |f@)P dx. Letting N tend to infinity, we obtain the 


= 2n Jn 
corollary. 


Corollary 1.55 (Riemann—Lebesgue Lemma). If f is in R[—z, 2] and has 


Fourier coefficients {c,}° _,,, then limj,|-.50 Cn = 0. 


REMARK. This improves on the inequality |c,| < x et | f (x)| dx observed 
above, which shows, by means of an explicit estimate, that {c,} is a bounded 
sequence. 


PROOF. This is immediate from Corollary 1.54. 


We now turn to convergence results. First it is necessary to clarify terms like 
“continuous” and “differentiable” in the context of Fourier series of functions 
on [—z, 2]. Each term of a Fourier series is defined on all of R and is periodic 
with period 27 and is really given as the restriction to [—z, 1] of this periodic 
function. Thus it makes sense to regard a general function in the same way if 
one wants to form its Fourier series: a function f is extended to all of R so as 
to be periodic with period 27, and if we consider f on [—7, z], it is really the 
restriction to [—z, 7] that we are considering. 

In particular, it makes sense to insist that f(—7) = f(z); if f does not 
have this property initially, one or both of these endpoint values will have to be 
adjusted, but that adjustment will not affect any Fourier coefficients. Similarly 
continuity of f will refer to continuity of the extended function on all of R, and 
similarly for differentiability. 

That being said, let us take up the matter of integration by parts for the functions 
we are considering. The scope of integration by parts in Corollary 1.33 was limited 
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to a pair of functions f and g that have a continuous first derivative. In the context 


of Fourier series, it is the periodic extensions that are to have these properties, 
and then the integration-by-parts formula simplifies. Namely, 


[" fae’ dx = PO -[ rereunas 


II 


5 " fleece) dx, 


i.e., the integrated term drops out because of the assumed periodicity. 

The simplest convergence result for Fourier series is that a periodic function (of 
period 277) with two continuous derivatives has a uniformly convergent Fourier 
series. To prove this, we take n 4 O and use the above integration-by-parts 
formula twice to obtain 


Ch = abe __f@e dx = ——(—— =f f'(xje™ dx 
20 
==-( =) f- f" (xe™ dx. 


Then |cn,e"*| = |en| < C/n?, where C = a =f | f” (x)| dx, and the Fourier 

series converges uniformly by the Weierstrass M test. The argument does not say 

that the convergence is to f, but that fact will be proved in Theorem 1.57 below. 
Adjusting the proof just given, we can prove a sharper convergence result. 


Proposition 1.56 . If f is periodic (of period 277) and has one continuous 
derivative, then the Fourier series of f converges uniformly. 

PROOF. As in the above argument, c, = — (+) J”, f’@e™ dx, and 
this equals + d,, where d,, is the n" Fourier coefficient of the continuous function 
f’. In the computation that follows, we use the classical Schwarz inequality (as 
in Section A5 of the appendix) for finite sums and pass to the limit in order to get 
the first inequality, and then we use Bessel’s inequality (Corollary 1.54) to get 
the second inequality: 


Ye lenl = > linen ee. = OUND 


nF0 n#0 nF0 n#0 
< (Se) Ga i @Pdx) 


The right side is finite, and the proposition follows from the Weierstrass M test. 
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The fact that the convergence in Proposition 1.56 is actually to f will follow 
from Dini’s test, which is Theorem 1.57 below. We first derive some simple 
formulas. The Dirichlet kernel is the periodic function of period 27 defined by 


vy  Sin((N + 3)x) 
D = inx __ 2 : 
we De ‘ sin 5x 


the second equality following from the formula for the sum of a geometric series. 
For a periodic function f of period 277, the partial sums of the Fourier series of 
f are given by 


N 


sn(f3x) = > (—— fen at)e™ 


n=—N me 


a i ~ 
--f fo 


eit at) dt 
n=—N 


= =f. f@Dn(x —t)dt 
20 Jz 


1 X+I0 


=— f(x —s)Dy(s) ds 
20 Jx—n 


= =f. f(x —t)Dy(t) dt, 
20 Jia 


the last two steps following from the changes of variables t t+ x + s (Theorem 
1.34) and s +» —s (Proposition 1.30h) and from the periodicity of f and Dy. 


60} 
50 
40 
30 
20 


1d 


an ant Nan 
-3 -2 “VV VvY 2 3 
ele 


FIGURE 1.3. Dirichlet kernel. Graph of Dy for N = 30. 


This is the kind of convolution integral that occurred in the previous section. 
Term-by-term integration shows that x ( Dy (x) dx = 1. However, Dy is not 
an approximate identity, not being everywhere > 0. Figure 1.3 shows the graph 
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of Dy for N = 30. Although Dy (x) looks small in the graph away from x = 0, 
itis small only as a percentage of Dy (0); Dy (x) does not have limy Dy (x) equal 
to 0 for x 4 0. Thus Dy (x) fails in a second way to be an approximate identity. 
The failure of Dy to be an approximate identity is what makes the subject of 
convergence of Fourier series so subtle. 


Theorem 1.57 (Dini’s test). Let f : R — C be periodic of period 27 and 
Riemann integrable on [—z, z]. Fix x in [—z, z]. If there are constants 6 > 0 
and M < +00 such that 


Ifa+n—f@)|<MIt| for |t| <4, 


then limy s,)(f; x) = f(x). 


REMARK. This condition is satisfied if f is differentiable at x. Thus the 
convergence of the Fourier series in Proposition 1.56 is to the original function 
f. By contrast, the Dini condition is not satisfied at x = 0 for the continuous 
periodic extension of the function f(x) = |x| '/2 defined on (—z, x]. 


PROOF. With x fixed, let 
fa —t)— f@) 


g(t) = sin t /2 
0 fort = 0. 


for 0 < |t| < z, 


Proposition 1.30d shows that (sin ¢/2)~! is Riemann integrable one < |t| < x for 
any € > 0, and hence so is g(t). Since g(t) is bounded near t = 0, Lemma 1.28 
shows that g(t) is Riemann integrable on [—z, zr]. Since 1 ee Dy (x) dx = 1, we 
have 


sy(f3x) — f@) 
sa : 1 sa : 1 
_ L/ Fi) sin ((N + 4)t) 1 | Sil ((N + 3)t) st 


| eae | 
sin 5¢ 2m Jen sin xt 


oh sayin ((N + 5)t) dt 


20 Jz 
1 m 1 ™ 
=e [ g(t) cos t| sin Nt dt + = [g(t) sin 5] cos Nt dt, 
1 1 


—x 1 


and both terms on the right side tend to 0 with N by the Riemann—Lebesgue 
Lemma (Corollary 1.55). 
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Dini’s test (Theorem 1.57) has implications for “localization” of the conver- 
gence of Fourier series. Suppose that f = g on an open interval J, and suppose 
that the Fourier series of f converges to f on J. Then Dini’s test shows that 
the Fourier series of # — g converges to 0 on J, and hence the Fourier series of 
g converges to g on J. For example, f could be a function with a continuous 
derivative everywhere, and g could have discontinuities outside the open interval 
I. For f, the proof of Proposition 1.56 shows that }* |cn| < +-oo. But for g, 
the Fourier series cannot converge so rapidly because the sum of a uniformly 
convergent series of continuous functions has to be continuous. Thus the two 
series locally have the same sum, but their qualitative behavior is quite different. 

Next let us address the question of the extent to which the Fourier series of f 
uniquely determines f. Our first result in this direction will be that if f and g 
are Riemann integrable and have the same respective Fourier coefficients, then 
f (x) = g(x) at every point of continuity of both f and g. It may look as if some 
sharpening of Dini’s test might apply just under the assumption of continuity of 
the function, and then this uniqueness result would be trivial. However, as we 
shall see in Chapter XII, the Fourier series of a continuous function need not 
converge to the function at particular points, and there can be no such sharpening 
of Dini’s test. Instead, we shall handle the uniqueness question in a more indirect 
fashion. 

The technique is to use an approximate identity, as in the proof of the Weier- 
strass Approximation Theorem in Section 9. Although the partial sums of the 
Fourier series of a continuous function need not converge at every point, the 
Cesaro sums do converge. To get at this fact, we shall examine the Fejér kernel 


1 N 
Ky (x) = ——— )_ Dn). 


The N™ Cesaro sum of s,(f; x) is given by x rise Ky (x —t)f (t) dt because 


— y (fix) = y i —t)f (t)dt 
NEI N41 Gd 


1 is 
= — Ky(x —t)f(t)dt. 
20 
The remarkable fact is that the Fejér kernel is an approximate identity even though 
the Dirichlet kernel is not, and the result will be that the Cesaro sums of a Fourier 
series converge in every way that they have any hope of converging. 


Kt 


Lemma 1.58. The Fejér kernel is given by 
1 1—cos(N + 1)x 


K = 
n() N+1 1 —cosx 
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PROOF. We show by induction on N that the values of K y (x) in the definition 
and in the lemma are equal. For N = 0, we have Ko(x) = Do(x) = 1 = Tacos Lt 
as required. Assume the equality for N — 1. Then 


II 


N 
(N + 1)K w(x) = 0 Da(x) = NKy-1(x) + Dy(a) 
n=0 


1 ea 
1—cosNx | sin ((N+4)x) sin 3x 
T . P . 
1 —cosx sin 5x sin 5x 


by induction 


1—cosNx +2sin ((N + 5)x) sin 5x 
1 —cosx 
1 —cos Nx — [cos ((N+5)x+4x) — cos ((N+4)x—5x)] 
1 —cosx 


a 1—cos(N + 1)x 
7 1 —cosx 


as required. 


In line with the definition of approximate identity in Section 9, we are to show 
that Ky (x) has the following properties: 


(i) Ky(x) = 0, 

Gi) 37 J", Kv@)dx = 1, 

(iii) for any 5 > 0, sups<),)<, K(x) tends to 0 as n tends to infinity. 
Property (i) follows from the definition of Ky (x), since cosx < 1 everywhere; 
(ii) follows from the definition of Ky(x) and the linearity of the integral, 
since x fos Dy,(x) dx = 1 for all n; and (i) follows from Lemma 1.58, since 
1 —cos(N + 1)x < 2 everywhere and 1 —cosx > 1—coséif 6 < |x| <z. 


Theorem 1.59 (Fejér’s Theorem). Let f : R — C be periodic of period 27 
and Riemann integrable on [—z, zr]. If f is continuous at a point xo in [—z, 7], 
then 


1 IU 
jim = / | FO)K laa =) dx = Fa). 


If f is uniformly continuous on a subset E of [—z, zr], then the convergence is 
uniform for xo in E. 
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PROOF. Choose M such that | f(x)| < M for all x. By (ii) and then (i), 


1 


=| F)Ky(% — x) dx — f (x0) 
WT Jon 


1 i 
=| = / Lf (x) — f (xo) Kno — x) dx 
ae ee 


< = | f(x) — f (xo)| Kn (xo — x) dx 
WT Jon 
1 
<= | f(x) — f(xo)|Kn (Xo — x) dx 
20 |x—xo|<6 
ES 2M ( sup Ky(t)) dx 
2m 8<|x—xo|<x b<|t |< 
< = | f(x) — f(xo)|Kn(%o —x)dx+2M sup Ky/(t). 
TW J \x—x9|<5 és|t\|<n 


Given € > 0, choose some 6 for € and continuity of f at xo or for € and 
uniform continuity of f on £. In the first term on the right side, we then have 
| f (x) — f (xo)| < € on the set where |x — xo| < 6. Thus a second use of (1) shows 
that the above expression is 


<e€+2M sup Ky(t). 


b<|t\<a 


With 6 fixed, property (iii) shows that the right side is < 2¢ if N is sufficiently 
large, and the theorem follows. 


Corollary 1.60 (uniqueness theorem). Let f : R— Candg:R—-C 
be periodic of period 27 and Riemann integrable on [—z, 7]. If f and g have 
the same respective Fourier coefficients, then f(x) = g(x) at every point of 
continuity of both f and g. 


REMARK. The fact that f and g have the same Fourier coefficients means that 
Sn(f3 Xx) = 5n(g; x) for all n, hence that 


1 1 f* 
~ | D,(x —1) f(t) dt = | Dy (x —t)g(t) dt 
TCS 20m Jen 
for all n. Then the same formula applies with D, replaced by its Cesaro sums 
Ky. 


PROOF. Apply Theorem 1.59 to f — g at a point xo of continuity of both f 
and g. 
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Our second result about uniqueness will improve on Corollary 1.60, saying 
that any Riemann integrable function with all Fourier coefficients 0 is basically 
the 0 function —at least in the sense that any definite integral in which it is a factor 
of the integrand is 0. We shall prove this improved result as a consequence of 
Parseval’s Theorem, which says that equality holds in Bessel’s inequality. The 
proof of Parseval’s Theorem will be preceded by an example and some lemmas. 


Theorem 1.61 (Parseval’s Theorem). Let f : R — C be periodic of period 
2x and Riemann integrable on [—z, zw]. If f(x) ~ 0%, ene”, then 


lim ~| lf (x) — sv(f; x)? dx =0 


and 
1 # 2 = 2 
ae Lf)? dx = leal?: 


REMARK. In terms of the coefficients a, and b,,, the corresponding result is 


1 514 2 ed) 
- | If (x)? dx = ar +> (lanl? + |bnl?). 


— 


a 


n= 
EXAMPLE. We saw near the beginning of this section that the periodic function 
oo sinnx 
f given by f(x) = 3(7—x) on (, 27) has f(x) ~ 


n=1 
of Parseval’s Theorem as in the remark, but with the interval (0, 277) replacing 


the interval (—r, 77), says that 72, 4 = Li las x)| dx. The right side 


es Oe ee eee eee 
is=z f"x°dx = = ©. Thus 


. The formulation 


This formula was discovered by Euler by other means before the work of Fourier. 


For the purposes of the lemmas and the proof of Parseval’s Theorem, let us 
introduce a “Hermitian inner product”? on R[—z, 71] by the definition 


1 a. a 
fae = 5 / Ora) ae. 


3The term “Hermitian inner product” will be defined precisely in Section II.1. The form (f, g) 2 
comes close to being one, but it fails to meet all the conditions because (f, f), = 0 is possible 
without f = 0. 
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as well as a “norm” defined by 


1 id 1/2 
Iflb= OA? = (— | PP dx) 


and a “distance function” defined by 


1 1 ; 1/2 
aa(f.8) = IF - al =(5— | Ife) —g@Pdx) 


The role of the function dz will become clearer in Chapter II, where “distance 
functions” of this kind will be studied extensively. 


Lemma 1.62. If f is in R[—z,7] and {”, |f(x)|/?dx = 0, then 
J”, |\f @)|dx = 0 and also [™_ f(x)g(x) dx = 0 forall g € R[-z, x]. 

PROOF. Write M = sup,¢;_7.21|f(x)|, and let « > 0 be given. Choose a 
partition P = {x;}/_) with U(P, If|2) < ie. 


n 


D (e @r)an se. 


j=l] XE LXi-1,%% 


Divide the indices from 1 to 1 into two subsets, A and B, with 


A={i| sup If@l =e] and =B={i| sup If@l < el. 


xe[xi-1,%i] x€[xi-1,xi] 


The sum of the contributions from indicesi € Ato U(P, lf?) is> €? eee AXx;, 
and thus }°;.4 Ax; < €. Hence 35.4 (suPyery, x) Lf @)|) Axi < Me. Also, 
Dies (SUP rete,_; x) |f I) Ax; < 22€. Therefore U(P,|f|) < 2 + Me. 
Since € is arbitrary, l", | f (x)|dx = 0. This proves the first conclusion. 

For the second conclusion it follows from the boundedness of |g|, say by M’, 
that |", f@)g@)dx| < x J, IF @llg@ldx < Hf, [flax =0. 


Lemma 1.63 (Schwarz inequality). If f and g are in R[—z, zr], then 
If, Sal S IF llalls lle. 


REMARK. Compare this result with the version of the Schwarz inequality in 
Section A5 of the appendix. This kind of inequality is put into a broader setting 
in Section II.1. 


PROOF. If ||g||, = 0, then Lemma 1.62 shows that (f, g), = 0 for all f. Thus 
the lemma is valid in this case. If ||g||, 4 0, then we have 
a 2. Far x 
0<|f -Ilglls’(f 828, = (f —Ilglz’(h og. f —llgls° 8)28)> 
= fb -2e lly 1h gal? +llells IG @ol* Iles = WAI lela IG aol’, 


and the lemma follows in this case as well. 
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Lemma 1.64 (triangle inequality). If f, g, and h are in R[—z, z], then 
dy(f, hk) = ad(f..g) + do, h). 
PROOF. For any two such functions F and G, Lemma 1.63 gives 
|F + Gl} =(F +G, F +G), =(F, F), + (F, G), +(G, F), + (G, G), 
= ||Flz + 2Re(F, G), + IG 
<FUS +21 F llGlly + GI = (IF lly + Gly)”. 


Taking the square root of both sides and substituting F = f — g andG = g —/A, 
we obtain the lemma. 


Lemma 1.65. Let f : R — Cbe periodic of period 27 and Riemann integrable 
on [—z, z], and let € > 0 be given. Then there exists a continuous periodic 
g : R > C of period 27 such that || f — gl|, < €. 


PROOF. Because of Lemma 1.64, we may assume that f is real-valued and is 
not identically 0. Define M = sup;,¢;_7.7 |f()| > 0, let € > 0 be given, and 
let P = {x;}?_) be a partition to be specified. Using P, we form the function g 
defined by 
t — Xj-1 

Ax; 

The graph of g interpolates the points (x;, f(x;)),0 <i <n, by line segments. 
Fix attention on a particular [x;_;,x;], and let J = infreyy, ,x,; f() and S = 
SUPrefx,_,x;1 (0). For t € [xj-1,x;], we have J < g(t) < S. Ata single 
point ¢ in this interval, f(t) > g(t) implies 7 < g(t) < f(t) < S, while 
g(t) => f(t) implies J < g(t) < f(t) < S. Thus in either case we have 
| f(t) — g(t)| < S —TJ. Taking the supremum over f¢ in the interval and summing 
oni, we obtain U(P,|f — g|) < U(P, f) — LO, f). 
Since |f — gl’ =|f — gllf + gl, we have 


sup |f()—g(l?< sup |f@)—g)| sup If(t)+e(0)| 


xi —t 
gt) = Fi) + fi) forxj_) <t < xj. 


teLy_1,%i] tely_1,%i] telxj_-1,xi] 
<2M sup |f(t)— g(t)! 
te[xi-1,xi] 


for 1 <i <n. Summing oni gives U(P, | f —g|*) < 2M(U(P, f)—-L(P, f)). 
Now we can specify P; itis to be any partition for which U(P, f)—L(P, f) < 
€*/(2M) and no Ax; is 0. Then 


O< tf", IfO-g@lP at < £U(P,\f — 8’) 
<M (U(P, f) - LP, f)) <2/Qn) < e, 


as required. 
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PROOF OF THEOREM 1.61. Given € > 0, choose by Lemma 1.65 a con- 
tinuous periodic g with || f — gil, < €. Write gx) ~ 0 che, and 
put gv(x) = a (ee Kwn(x — t)g(t) dt, where Ky is the Fejér kernel. Fejér’s 
Theorem (Theorem 1.59) gives sup, ¢;_7,71 18(*) —gn(x)| < € for N sufficiently 
large. Since any Riemann integrable h has ||h||, < sup, <;_z,7] |A(@)|, we obtain 
llg — gn ll, < € for N sufficiently large. Fixing such an N and substituting from 
the definition of Ky, we have 


(x) = Masset te “D — t)g(t)dt 
w= Le | n(x &§ 


1 N n . N . 
_ > » cy = s. dye 


N+1 n=0 k=—n n=—N 


for suitable constants d,,. Theorem 1.53 and Lemma 1.64 then give 


(—f ey dx— ¥ te) = [p= 5 eget 


20 1 n=—N 


2 


N 
<|f- 0 de | = If - enh 
n=—N 


< Ilf — sll, + lls — gnllz < € +€ = 2e, 


and the result follows. 


Corollary 1.66 (uniqueness theorem). Let f : R — C be periodic of period 
2m and Riemann integrable on [—z, z]. If f has all Fourier coefficients 0, then 
J”, |\f @)|dx =Oand f”_ f(x)g(x) dx = 0 for every member g of R[—z, 1]. 


PRooF. If f has all Fourier coefficients 0, then fe lf (x)/?dx = 0 by 
Theorem 1.61. Application of Lemma 1.62 completes the proof of the corollary. 


It is natural to ask which sequences {c,} with )> |cn |* finite are the sequences 
of Fourier coefficients of some f € R[—z, 7]. To see that this is a difficult ques- 
tion, one has only to compare the two series )-°°., n~! sinx and )°°° , n7! cos x 
studied at the beginning of this section. The first series comes from a function in 
R[—1z, 1], but a little argument shows that the second does not. It was an early 
triumph of Lebesgue integration that this question has a elegant answer when 
the Riemann integral is replaced by the Lebesgue integral: the answer when the 
Lebesgue integral is used is given by the Riesz—Fischer Theorem in Chapter VI, 
namely, any sequence with > |cy, |* finite is the sequence of Fourier coefficients 
of a square-integrable function. 
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10. 


11. 
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11. Problems 


Derive the least-upper-bound property (Theorem 1.1) from the convergence of 
bounded monotone increasing sequences (Corollary 1.6). 


According to Newton’s method, to find numerical approximations to ./a when 
a > 0, one can set x9 = | and define x,4) = 5 (x2 + a)/x, forn > 0. Prove 
that {x,} converges and that the limit is ./a. 


Find lim supa, and liminfa, when a, is defined by a; = 0, do, = 52-1; 
Qn+1 = 5 + a2,. Prove that your answers are correct. 


For any two sequences {a,} and {b,} in R, prove that limsup(a, + b,) < 
lim supa, + limsup b,, provided the two terms on the right side are not +-0o 
and —oo in some order. 


Which of the following limits exist uniformly forO < x < 1: (@) limy+0x”, 
Gi): lity icg 8? itt G11) Titties x*/k? Supply proofs for those that do 
converge uniformly. For the other ones, prove anyway that there is uniform 
convergence on any interval O < x < 1 — €, where e > 0. 


Let a,(x) = (—1)"x"(1 — x) on [0, 1]. Show that pa: an (x) converges uni- 
formly and that }°*° 9 |a,(x)| converges pointwise but not uniformly. 

(Dini’s Theorem) Suppose that f,, : [a,b] — R is continuous and that 
fi < fo < fs <---. Suppose also that f(x) = lim f, (x) is continuous and 


is nowhere +oo. Use the Bolzano—Weierstrass Theorem (Theorem 1.8) to 
prove that f, converges to f uniformly fora <x <b. 


Prove that 


veg St ae eh ah ae IS 


for all x > 0. 


Let f : (—00, +00) > R be infinitely differentiable with | f (x)| < 1 for all 
nand x. Use Taylor’s Theorem (Theorem 1.36) to prove that 


oo MO 
foe Oe 
n=0 


n! 


for all x. 


(Helly’s Selection Principle) Suppose that {F;,} is a sequence of nonde- 
creasing functions on [—1, 1] with 0 < F,(x) < 1 for all m and x. Using a 
diagonal process twice, prove that there is a subsequence { F,,,} that converges 
pointwise on [—1, 1]. 


Prove that the radius of convergence of }°>° 9 a,x” is 1/limsup </an|. 


12. 


13. 


14. 


15. 


16. 


17. 
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Find a power series expansion for each of the following functions, and find the 
radius of convergence: 

(a) 1/d-x) = Rd —x)I, 

(b) logl—x)=-f) 4, 
(c) 1/1. +x”), 


(d) arctanx = [* “ 


O 141° 

Prove, along the lines of the proof of Corollary 1.46a, that cos x has an inverse 
function arccosx defined for 0 < x < mz and that the inverse function is 
differentiable. Find an explicit formula for the derivative of arccosx. Relate 
arccos x to arcsinx when 0 < x < 7/2. 


State and prove uniform versions of Abel’s Theorem (Theorem 1.48) and of the 
corresponding theorem about Cesaro sums (Theorem 1.47), the uniformity being 
with respect to a parameter x. 


Prove that the partial sums aaa cos n@ and ye sin né@ are uniformly bounded 
on any sete <0 < 2m —ec€ife >0. 


Verify the following calculations of Fourier series: 


_f+l for0O<x<z7 4 SS sin(2n — 1)x 
(a) roy ={*I Sg eee aoe n-1 
iQ ot oo inx 
(b) f(x) = e-* on 0, 277) has f(x) ~ ——"*" > * provided 
ce nonw a 


a is not an integer. 


Combining Parseval’s Theorem (Theorem 1.61) with the results of Problem 16, 


prove the following identities: 
2 


oe 1 oe 1 a 
(a) D. Gna 8” OD. lmtale sin? 


Pima sin 


2 


Ta 


Problems 18-19 identify the continuous functions f : R > C with f(x)fQ) = 
f(x + y) for all x and y as the 0 function and the functions f(x) = e“, using two 
different kinds of techniques from the chapter. 


18. 


19. 


Put F(x) = ie f(t) dt. Find an equation satisfied by F’,, and use it to show that 
f is differentiable everywhere. Then show that f’(y) = f’(0) f(y), and deduce 
the form of f. 


Proceed without using integration. Using continuity, find x9 > 0 such that the 
expression | f (x) — 1| is suitably small when |x| < |xo|. Show that f(2~xo) is 
then uniquely determined in terms of f (xo) for all k > 0. If f is not identically 
0, use x9 to define c. Then verify that f(x) = e® for all c. 


Problems 20-22 construct a nonzero infinitely differentiable function f : R > R 
having all derivatives equal to 0 at one point. 
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20. Let P(x) and Q(x) be two polynomials with Q not the zero polynomial. Prove 
that ; 
im 2® eV = 
yogi sa 


21. With P and Q as in the previous problem, use the Mean Value Theorem to prove 
that the function g : R > R with 


P(x) ,—-1/x2 
gyn | Be reo 
0 


for x = 0, 


has g’(x) = 0 and that g’ is continuous. 


22. Prove that the function f : R > R with 


f(x) | el forx £0, 
x)= 
0 for x = 0, 


is infinitely differentiable with derivatives of all orders equal to 0 atx = 0. 


Problems 23-26 concern a generalization of Cesaro and Abel summability. A 
Silverman—Toeplitz summability method refers to the following construction: One 
starts with a system {Mj;};,;>0 of nonnegative real numbers with the two properties 
that (1) pay M;; = | for alli and (ii) lim;_,.. Mj; = 0 for all 7. The method associates 
to a complex sequence {s,},>0 the complex sequence {f,},>0 with 4; = )~ j>0 Mij5; 
as if the process were multiplication by the infinite square matrix {M;;} on infinite 
column vectors. 


23. Prove that if {s,} is a convergent sequence with limit s, then the corresponding 


sequence {t,} produced by a Silverman—Toeplitz summability method converges 
and has limit s. 


24. Exhibit specific matrices {Mj;} that produce the effects of Cesaro and Abel 
summability, the latter along a sequence 7; increasing to |. 


25. Let r; be a sequence increasing to 1, and define M;; = (j + Ir) —7;)*. 
Show that {M;;} defines a Silverman—Toeplitz summability method. 


26. Using the system {Mj} in the previous problem, prove the following: ifa bounded 
sequence {s,,} is not necessarily convergent but is Cesaro summable to a limit o,, 
then {s,,} is Abel summable to the same limit o. 


Problems 27—29 concern the Poisson kernel, which plays the same role for Abel sums 
of Fourier series that the Fejér kernel plays for Cesaro sums. For 0 <r < 1, define 
the Poisson kernel P,(@) to be the r Abel sum of the Dirichlet kernel D,(@) = 
14+; 1 (e'* + e—'*) In the terminology of Section 8 this means that ag = 1 and 
ay = e'* + e'* for k > 0, so that the sequence of partial sums )~/_» ax is exactly 
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the sequence whose n“ term is D, (0). The r™ Abel sum paar ayr" is therefore the 
expression 
oo . 
POY = Sees 
n=—Oo 
27. For f in R[—x, x], verify that the r Abel sum of s,(f;x) is given by the 
expression st ties P,.(0 — ¢) f (v) dg. 
1—r? 


28. Verify that P.(¢) = ———_____~ 
cetyl EA 1 —2rcosé6+r2 


properties: 
(i) P,(@) = 0, 
Gi) +f”, P.(0) do =1, 
(iii) for any 5 > 0, sups<jg\<, P, (0) tends to 0 as r increases to 1. 


. Deduce that P,(@) has the following 


29. Let f : R > C be periodic of period 27 and Riemann integrable on [—z, z]. 
(a) Prove that if f is continuous at a point 4 in [—7z, zr], then 


lim — i " P.(0y — 9) f(0)d0 = F (60). 
rel 20 —n 


(b) Prove that if f is uniformly continuous on a subset FE of [—7, zr], then the 
convergence in (a) is uniform for 09 in EF. 


Problems 30-35 lead to a proof without complex-variable theory (and in particular 
without the complex logarithm) that exp (z + 52° + 5 aR: -) = 1/(1 — z) for all 
complex z with |z| < 1. 

30. Suppose that R > 0, that f,(x) = yaar Cn,zx" 1s convergent for |x| < R, that 
Cn.k = O for all n and k, and that limg_.o0 fx(x) = f(x) uniformly for |x| <r 
wheneverr < R. Prove foreachr < R that some subsequence { f;,} of {f;} has 
lim)oo fj, (x) existing uniformly for |x| <r. 

31. In the setting of the previous problem, prove that f is infinitely differentiable for 
[Ix| < R. 

32. In the setting of the previous two problems, use Taylor’s Theorem to show that 
J (x) is the sum of its infinite Taylor series for |x| < R. 

33. If0 <r <1, prove for |z| <r that | 52" + gipzNt! +---| <r%/(—r), and 
deduce that exp ($z2" + ~i,2%t! +.---) converges to | uniformly for |z| <r. 


34. Why is it true that if a power series }°>° 9 Caz" with complex coefficients sums 
to 0 for all real z with |z| < R, then it sums to 0 for all complex z with |z| < R? 


35. Prove that exp (z+ 527+ 42°+---) =1/(1 —z) forall complex z with |z| < 1. 


CHAPTER II 


Metric Spaces 


Abstract. This chapter is about metric spaces, an abstract generalization of the real line that allows 
discussion of open and closed sets, limits, convergence, continuity, and similar properties. The usual 
distance function for the real line becomes an example of a metric. The other notions are defined in 
terms of the metric. The advantage of the generalization is that proofs of certain properties of the 
real line immediately go over to all other examples. 

Section 1 gives the definition of metric space and open set, and it lists a number of important 
examples, including Euclidean spaces and certain spaces of functions. 

Sections 2 through 4 develop properties of open and closed sets, continuity, and convergence of 
sequences that are simple generalizations of known facts about R. 

Section 5 shows how a subset of a metric space can be made into a metric space so that the 
restriction of a continuous function from the whole space to the subset remains continuous. It also 
shows that three natural metrics for the product of two metric spaces lead to the same open sets, 
continuous functions, and convergent sequences. 

Section 6 shows that any metric space is “Hausdorff,” “regular,” and “normal,” and it goes on to 
exhibit three different countability hypotheses about a metric space as equivalent. A metric space 
with these properties is called “separable.” 

Section 7 concerns compactness and completeness. A metric space is defined to be “compact” 
if every open cover has a finite subcover. This property is equivalent to the condition that every 
sequence has a convergent subsequence. The Heine—Borel Theorem says that the compact sets of 
R” are exactly the closed bounded sets. A number of the results early in Chapter I that were proved 
by the Bolzano—Weierstrass Theorem in the context of the real line are seen to extend to any compact 
metric space. A metric space is “complete” if every Cauchy sequence is convergent. A metric space 
is compact if and only if it is complete and “totally bounded.” 

Section 8 concerns connectedness, which is an abstraction of the property of an interval of the 
line that accounts for the Intermediate Value Theorem. 

Section 9 proves a fundamental result known as the Baire Category Theorem. A sample con- 
sequence of the theorem is that the pointwise limit of a sequence of continuous complex-valued 
functions on a complete metric space must have points where it is continuous. 

Section 10 studies the spaces of real-valued and complex-valued continuous functions on a 
compact metric space. A generalization of Ascoli’s Theorem from the setting of Chapter I provides a 
characterization of compact sets in either of these spaces of continuous functions. A generalization of 
the Weierstrass Approximation Theorem, known as the Stone—Weierstrass Theorem, gives sufficient 
conditions for a subalgebra of either of these spaces of continuous functions to be dense. One 
consequence is that these spaces of continuous functions are separable. 

Section 11 constructs the “completion” of a metric space out of Cauchy sequences in the given 
space. The result is a complete metric space and a distance-preserving map of the given metric space 
into the completion such that the image is dense. 
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Let X be a nonempty set. A function d from X x X, the set of ordered pairs of 
members of X, to the real numbers is a metric, or distance function, if 

(i) d(x, y) => 0 always, with equality if and only if x = y, 

(ii) d(x, y) = d(y, x) for all x and y in X, 

(iii) d(x, y) < d(x, z) + d(z, y) for all x, y, and z, the triangle inequality. 
In this case the pair (X, d) is called a metric space. 

The real line R! with metric d(x, y) = |x — y| is the motivating example. 
Properties (i) and (ii) are apparent, and property (iii) is readily verified one case 
at a time according as z is less than both x and y, z is between x and y, or z is 
greater than both x and y. 

We come to further examples in a moment. Particularly in the case that X is 
a space of functions, a space may turn out to be almost a metric space but not to 
satisfy the condition that d(x, y) = 0 implies x = y. Accordingly we introduce 
a weakened version of (i) as 

(i) d(x, y) > 0 and d(x, x) = 0 always, 
and we say that a function d from X x X to the real numbers is a pseudometric 
if (i), (ii), and (iii) hold. In this case, (X, d) is called a pseudometric space. 

Let (X, d) be a pseudometric space. If r > 0, the open ball of radius r and 
center x, denoted by B(r; x), is the set of points at distance less than r from x, 
namely 

Bar;x) ={y € X | d(x, y) <r}. 


The name “ball” will be appropriate in Euclidean space in dimension three, which 
is part of the Example 1 below, and “ball” is adopted for the corresponding notion 
in a general pseudometric space. 

A subset U of X is open if for each x in U and some sufficiently small r > 0, 
the open ball B(r; x) is contained in U. For the line the open balls in the above 
sense are just the bounded open intervals, and the open sets in the above sense 
are the usual open sets in the sense of Chapter I. 


Lemma 2.1. In any pseudometric space (X, d), every open ball is an open set. 
The open sets are exactly all possible unions of open balls. 


PROOF. Let an open ball B(r; x) be given. If y is in B(r; x), then the open ball 
Bir — d(x, y), y) has center y and positive radius; we show that it is contained 
in B(r; x). In fact, if zis in Br — d(x, y), y), then the triangle inequality gives 


d(x,z) <d(x,y)+d(y,z) < d(x, y)+(r —d(x, y)) =r, 


and the containment follows. 
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For the second assertion it follows from the definition of open set that every 
open set is the union of open balls. In the reverse direction, let U be a union of 
open balls. If y is in U, then y lies in one of these balls, say in B(r; x). We have 
just shown that some open ball B(s; y) is contained in B(r; x), and B(r; x) is 
contained in U. Thus B(s; y) is contained in U, and U is open. 


EXAMPLES. 


(1) Euclidean space R”. Fix an integer n > 0. Let R” be the space of 
all n-tuples of real numbers x = (x1,...,%,). We define addition of n-tuples 
componentwise, and we define scalar multiplication by cx = (cx,,..., CX,) for 
real c. Following the normal convention in linear algebra, we identify this space 
with the real vector space, also denoted by R”, of all n-component column vectors 


x) 
of real numbers x = | : |. Generalizing the notion of absolute value when 
Xn 
1/2 ; : 
n = 1, we let |x| = ()-1%))  forx = (x1,...,Xn,) in R”. The quantity |x| 


is the Euclidean norm of x. The Euclidean norm satisfies the properties 


(a) |x| > O always, with equality if and only if x equals the zero tuple 
0=(0,...,0), 

(b) |cx| = |c||x| for all x and for all real c, 

(c) |x + y| < |x| + |y| for all x and y. 


Properties (a) and (b) are apparent, but (c) requires proof. The proof makes use 
of the familiar dot product, given by x - y = ae Meee AL, X-Mas ee) 
and y = (j1,..-., yn). In terms of dot product, the Euclidean norm is nothing 
more than |x| = (x - x)!/*. The dot product satisfies the important inequality 
|x - y| < |x||y|, known as the Schwarz inequality and proved for this context in 
Section A5 of the appendix. A more general version of the Schwarz inequality 
will be stated and proved in Lemma 2.2 below. The Schwarz inequality implies 
(c) above because we then have 


Ix+yP=(Qt+y)-@ty)H=x-xt+2-y)ty-y 
= |x|? +2 -y) + ly? < |x? + 2lellyl + lyl? = (axl + Ly). 


We make X = R” into a metric space (X, d) by defining 
d(x,y) = |x — yl. 
Properties (i) and (ii) of a metric are immediate from (a) and (b), respectively; 


property (iii) follows from (c) in the form |a + b| < |a| + |b| if we substitute 
a= x-—zandb = y—z. Forn = 1, this example reduces to the line as 
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discussed above. For n = 2, open balls are geometric open disks, while for 
n = 3, open balls are geometric open balls. For any n, the open sets in the metric 
space coincide with the open sets as defined in calculus of several variables. 


(2) Complex Euclidean space C”. The space C of complex numbers, with 
distance function d(z, w) = |z — w| as in Section I.5, can be seen in two ways to 
be a metric space. One way was carried out in Section I.5 and directly uses the 
properties of the absolute value function |z| in Section A4 of the appendix. The 
other way is to identify z = x + iy with the member (x, y) of IR’, and then the 
absolute value |z| equals the Euclidean norm |(x, y)| in the sense of Example 1; 
hence the construction of Example 1 makes the set of complex numbers into a 
metric space. More generally the complex vector space C” of n-tuples 


Z = (Z15 0005 Zn) = (X1,- Hn) HE, - Yn) =H +I 


becomes a metric space in two equivalent ways. One way is to define the norm 
jz| = ( ia |Z; fe) 7 asa generalization of the Euclidean norm for R”; then we 
put d(z, w) = |z — w|. The argument that d satisfies the triangle inequality is a 
variant of the one for R”: The object for C” that generalizes the dot product for 
IR” is the Hermitian inner product 


(a) = (Caries 
j=1 


The Euclidean norm is given in terms of this expression by |z| = (z, z)!/7, and the 
version of the Schwarz inequality in Section A5 of the appendix is general enough 
to show that |(z, w)| < |z||w|. The same argument as for Example 1 shows that the 
norm satisfies the triangle inequality, and then it follows that d satisfies the triangle 
inequality. The other way to view C” as a metric space is to identify C” with R” 
by (21, --+5 Zn) > (41, ---, Xn, Y1,---» Yn) and then to use the metric on R2” from 
Example 1. This is the same metric, since )Y"_, |zj|? = )°j_)47 + 7_197- 
We still get the same metric if we instead use the identification (z1,..., Z,,) be 
(x1, Y1,--+,Xns Yn). With either identification the Hermitian inner product (z, w) 
for C” corresponds to the ordinary dot product for R?”. 


(3) System IR* of extended real numbers. The function f(x) = x/(1 + x) 
carries [0, +00) into [0, +1) and has g(y) = y/(1 — y) as a two-sided inverse. 
Therefore f is one-one and onto. We can extend f so that it carries (—oo, +00) 
one-one onto (—1, +1) by putting f(x) = x/(1 + |x|). We can extend f further 
by putting f(—oo) = —1 and f(4+00) = +1, and then f carries [—ow, +00], 
ie., all of R*, one-one onto [—1,+1]. The function f is nondecreasing on 
[—oo, +00]. For x and x’ in R*, let 


d(x, x')=|fx) — f@)I. 
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We shall show that d is a metric. By inspection, d satisfies properties (i) and (11) 
of a metric, and we are to prove the triangle inequality (iii), namely that 


dex) <dG@, x ) de 52): 


The critical fact is that f is nondecreasing. Since d satisfies (ii), we may assume 
that x < x’, and then 


d(x, x’) = fx’) — f(x). 


We divide the proof into three cases, depending on the location of x” relative to 
x and x’. The first case is that x” < x, and then 


d(x, x") + d(x", x’) = fx)— f+ £@)— f@"). 


Thus the question is whether 


FO’) — f@) < FO) — FO") + FO") — FO"), 


hence whether 
2 
ZFS YS 2): 


This inequality holds, since f is nondecreasing. The second case is that x < 
x” < x’, and then 


d(x, x") + d(x", x’) = fF") — fF) + £O') — £0") = f@) — fF). 


Hence equality holds in the triangle inequality. The third case is that x’ < x”, 
and then 


d(x, x”) + d(x”, x’) =F") f(x) fx’) f(x’). 


The triangle inequality comes down to the question whether 


27) < 2h); 


This inequality holds, since f is nondecreasing. We conclude that (IR*, d) is a 
metric space. It is not hard to see that the open balls in R* are all intervals (a, b), 
[—oo, b), (a, too], and [—oo, +c0] with —co < a < b < +00. Each of these 
open balls in IR* intersects R in an ordinary open interval, bounded or unbounded. 
The open sets in R therefore coincide with the intersections of IR with the open 
sets of R*. 
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(4) Bounded functions in the uniform metric. Let S be a nonempty set, and 
let X = B(S) be the set of all “scalar’-valued functions f on S that are bounded 
in the sense that | f(s)| < M for all s € S and for a constant M depending on 
f. The scalars are allowed to be the members of R or the members of C, and 
it will ordinarily make no difference which one is understood. If it does make a 
difference, we shall write B(S, R) or B(S, C) to be explicit about the range. For 
f and g in B(S), let 

d(f,g)= a If (s) — g(s)I. 
sE 


It is easy to verify that (X, d) is a metric space. Let us not lose sight of the fact 
that the members of X are functions. When we discuss convergence of sequences 
in a metric space, we shall see that a sequence of functions in this X converges if 
and only if the sequence of functions converges uniformly on S. 


(5) Generalization of Example 4. We can replace the range R or C of the 
functions in Example 4 by any metric space (R, e). Fix a point ro in the range 
R. A function f : S > R is bounded if p(f (s), 70) < M for all s and for some 
M depending on f. This definition is independent of the choice of rp because p 
is assumed to satisfy the triangle inequality. If we let X be the space of all such 
bounded functions from S to R, we can make X into a metric space by defining 
d(f, 8) = supyes P(f(S), (5). 


(6) Sequence space ¢7. This is the space of all sequences {cn} 
with ~ |cn|* < oo. A metric is given by 


(ee) 


hoo Of scalars 


(ben) (du) =( 2 len -al?) 


n=—C 


In the case of complex scalars, this example arises as a natural space containing 
all systems of Fourier coefficients of Riemann integrable functions on [—z, 7], 
in the sense of Chapter I. Proving the triangle inequality involves arguing as in 
Examples | and 2 above and then letting the number of terms tend to infinity. 


The role of the dot product is played by ({en}, {dn}) = 22 end 


n=—OO 
(7) Indiscrete space. If X is any nonempty set and if d(x, y) = 0 for all x 
and y, then d is a pseudometric and the only open sets are X and the empty set 
@. If X contains more than one element, then d is not a metric. 


(8) Discrete metric. If X is any nonempty set and if 


1 ifx #y, 


a=] 4 ifx=y 


then d is a metric, and every subset of X is open. 
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(9) Let S be a nonempty set, fix an integer n > 0, and let X be the set of 
n-tuples of members of S. For n-tuples x = (x1,...,%,) and y = (j1,..-, Yn), 
define 

d(x, y) = #{j | xj F yj}, 
the number of components in which x and y differ. Then (X, d) is a metric space. 
The proof of the triangle inequality requires a little argument, but we leave that 
for Problem | at the end of the chapter. Every subset of X is open, just as with 
the discrete metric in Example 8. 

(10) Hedgehog space. Let X be R?’, and single out the origin for special 
attention. Let d be the metric of Euclidean space, and define 
d(x, y) if x and y are on the same ray from 0, 


P(x, y)= | d(x,0)+d(0, y) otherwise. 


Then p is a metric. Every open set in (X, d) is open in (X, (¢), but a set like the 
one in Figure 2.1 is open in (X, ) but not in (X, d). 


FIGURE 2.1. An open set centered at the origin in the hedgehog space. 


(11) Hilbert cube. Let X be the set of all sequences {x }m>1 of real numbers 
satisfying 0 < x», < 1 for all m, and put 


A({Xm}, {Ym}) = S| 2 tm — Yl. 


m=1 


Then (X, d) is a metric space. To verify the triangle inequality, we can argue as 
follows: Let {xm}, {ym}, and {zm} be in X. For each m, we have 


Po Ym| S oF hee = Gal een een Yml- 


Thus 


Xm — Zm| A Sm — Ym 


N 
ym = Yn 
m=1 


| 
Xm — Zm\| 4 Zm — Yul 


N N 
Eo aya) Pen 

m=1 m=1 

CO CO 
52" be 

m=1 m=1 


for each N. Letting N tend to infinity yields the desired inequality. 
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(12) L' metric on Riemann integrable functions. Fix a nontrivial bounded 
interval [a, b] of the line, let X be the set of all Riemann integrable complex- 
valued functions on [a, b] in the sense of Chapter I, and define 


b 
ai(f, 8) =i I f(x) — g(@x)| dx 


for f and g in X. Then (X,d,) is a pseudometric space. It can happen that 
b ; ; 
iP | f(x) — g(x)|dx = 0 without f = g; for example, f could differ from g at 
a single point. Therefore dj is not a metric. 
(13) L? metric on complex-valued R[—z, 2]. This example arose in the 
discussion of Fourier series in Section I.10, and it was convenient to include a 
factor + in front of integrals. Let X = R[—z, x], and define 


1 eo 1/2 
aa(f.s)=(s- ff) - Pax)" 


Then (X, dz) is a pseudometric metric space. The triangle inequality was proved 
in Lemma 1.64 using the version of the Schwarz inequality in Lemma 1.63; that 
version of the Schwarz inequality needed a special argument given in Lemma 
1.62 in order to handle functions f whose norm satisfies || f'|, = 0. 


The constructions of metric spaces in Examples 1,2, 6, and 13 are sufficiently 
similar to warrant abstracting what was involved. We start with a real or complex 
vector space V, possibly infinite-dimensional, and with a generalization (-, -) 
of dot product. This generalization is a function from V x V to R in the case 
that V is real, and it is a function from V x V to C in the case that V is complex. 
We shall write the scalars as if they are complex, but only real scalars are to be 
used if the vector space is real. The function is written (-, -) and is assumed to 
satisfy the following properties: 

(i) it is linear in the first variable, i.e., (xj + x2, y) = (%1, y) + (%, y) and 
(cx, y) = cx, y), 
(ii) it is conjugate linear in the second variable, ic., (x,y; + yo) = 
(x, yi) + (&, y2) and (x, cy) = ax, y), 
(iii) itis symmetric in the real case and Hermitian symmetric in the complex 
case, 1e., (y, x) = (x, y), 

(iv) it is definite, i.e., (x, x) > Oifx 40. 

The form (-, -) is called an inner product if V is real or complex and is often 
called also a Hermitian inner product if V is complex; in either case, V with the 
form is called an inner-product space. Two vectors x and y with (x, y) = 0 are 
said to be orthogonal; the notion of orthogonality generalizes perpendicularity 
in the case of the dot product. 
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For either kind of scalars, we define ||x|| = (x, x)'/*, and the function || - || 
is called the associated norm. We shall see shortly that a version of the Schwarz 
inequality is valid in this generality, the proof being no more complicated than 
the one in Section A5 of the appendix. 

In many cases in practice, item (iv) is replaced by the weaker condition that 

(iv’) (-, -) is semidefinite, i.c., (x, x) > Oifx 40. 
This was what happened in Example 13 above. In order to have a name for 
this kind of space, let us call V with the semidefinite form (-, -) a pseudo 
inner-product space. It is still meaningful to speak of orthogonality. It is still 
meaningful also to define ||x || = (x, x)!/*, and this is called the pseudonorm for 
the space. The Schwarz inequality is still valid, but its proof is more complicated 
than for an inner-product space. The extra complication was handled by Lemma 
1.62 in the case of Example 13 in order to obtain a little extra information; the 
general argument proceeds along different lines. 


Lemma 2.2 (Schwarz inequality). Let V be a pseudo inner-product space with 
form (-, -). If x and y are in V, then |(x, y)| < ||x]l|lyll- 
PROOF. First suppose that ||y|| 4 0. Then 


Pa 2 a: = 
0 < |x —[lyI-°@, yy! = (@ — IlyIl-7@, yy), @ = lly? @, yy) 
= |Ix|l? —2llyl7l@, WP? + yl 7G, WP Uy? = lel? — Wy 7G wr’, 
and the inequality follows in this case. 


Next suppose that ||y|| = 0. It is enough to prove that (x, y) = 0 for all x. If 
c is areal scalar, we have 


IIx-+ey |? =(xt+ey, xtcy) = lx ll? +2 Re(x, cy) +e? Ily ||? = lll? +2c Re(x, y). 


The left side is > 0 as c varies, but the right side can be < 0 unless Re(x, y) = 
0. Thus we must have Re(x, y) = 0 for all x. Replacing x by ix gives us 
Im(x, y) = — Rei(x, y) = — Re(ix, y), and this we have just shown is 0 for all 
x. Thus Re(x, y) = Im(x, y) = 0, and (x, y) = 0. 


Proposition 2.3 (triangle inequality). If V is a pseudo inner-product space 


with form (-, -) and pseudonorm || - ||, then the pseudonorm satisfies 
(a) ||x|| => Oforallx eV, 
(b) ||cx|| = |e|||x|| for all scalars c and all x € V, 


(c) |lx + y|] < |lx|l + |ly|] for all x and yin V. 
Moreover, the definition d(x, y) = ||x — y|| makes V into a pseudometric space. 
The space V is a metric space if the pseudo inner-product space is an inner-product 
space. 
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PROOF. Properties (a) and (b) of the pseudonorm are immediate, and (c) follows 
because 


Ix + yl? =@+y, x+y) = (t,x) +2ReQ, y) +, y) 
= |[x |? + 2Re(x, y) + yl? < Pall? + 2iellilyll + My? = Cll + Uy’. 


Putting x = a —c and y = c — b gives d(a,b) < d(a,c) + d(c,b), and 
thus d satisfies the triangle inequality for a pseudometric. The other properties 
of a pseudometric are immediate from (a) and (b). If the form is definite and 
d(f, g) = 0,then (f —g, f—g) = Oand hence the definiteness yields f —g = 0. 


EXAMPLES, continued. 

14) Let us take double integrals of continuous functions of nice subsets of R? 
as known. (The detailed study of general Riemann integrals in several variables 
occurs in Chapter III.) Let V be the complex vector space of all power series 
F(Z) = eae Cnz" with infinite radius of convergence. Since any such F(z) 
is bounded on the open unit disk D = {z ec | |z| < 1}, the form (F,G) = 
IgF (z)G(z) dx dy is meaningful and makes V into an inner-product space. The 
proposition shows that V becomes a metric space with metric given by d(F, G) = 


(Sp IF @) — Gz) P-dx dy)”. 


2. Open Sets and Closed Sets 


In this section we generalize the Euclidean notions of open set, closed set, 
neighborhood, interior, limit point, and closure so that they make sense for all 
pseudometric spaces, and we prove elementary properties relating these metric- 
space notions. In working with metric spaces and pseudometric spaces, it is often 
helpful to draw pictures as if the space in question were R*, even computing 
distances that are right for R*. We shall do that in the case of the first lemma but 
not afterward in this section. Let (X, d) be a pseudometric space. 


Lemma 2.4. If z is in the intersection of open balls B(r; x) and B(s; y), 
then there exists some t > 0 such that the open ball B(t; z) is contained in that 
intersection. Consequently the intersection of two open balls is open. 


REMARK. Figure 2.2 shows what B(t; z) looks like in the metric space R?. 


PROOF. Take t = min{r — d(x, z), s — d(y, z)}. If w is in B(f; z), then the 
triangle inequality gives 


d(x,w) <d(x,z)+d(z,w) <d(x,z)+t <dx,24+(r-—dQ,z)) =r, 


and hence w isin B(r; x). Similarly w is in B(s; y). 
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< 


FIGURE 2.2. Open ball contained in an intersection of two open balls. 


Proposition 2.5. The open sets of X have the properties that 


(a) X and the empty set © are open, 
(b) an arbitrary union of open sets is open, 
(c) any finite intersection of open sets is open. 


PROOF. We know from Lemma 2.1 that a set is open if and only if it is the 
union of open balls. Then (b) is immediate, and (a) follows, since X is the union 
of all open balls and @ is an empty union. For (c), it is enough to prove that U NV 
is open if U and V are open. Write U = U, By and V = Us Bg as unions of 
open balls. Then UN V = Uap (By M Bg), and Lemma 2.4 shows that U N V 
is exhibited as the union of open balls. Thus UM V is open. 


A neighborhood of a point in X is any set that contains an open set containing 
the point. An open neighborhood is a neighborhood that is an open set.! A 
neighborhood of a subset F of X is a set that is a neighborhood of each point 
of E. If A is a subset of X, then the set A®° of all points x in A for which A is 
a neighborhood of x is called the interior of A. For example, the interior of the 
half-open interval [a, b) of the real line is the open interval (a, b). 


Proposition 2.6. The interior of a subset A of X is the union of all open sets 
contained in A; that is, it is the largest open set contained in A. 


PROOF. Suppose that U C A is open. If x is in U, then U is an open 
neighborhood of x, and hence A is a neighborhood of x. Thus x is in A®, and A°® 
contains the union of all open sets contained in A. For the reverse inclusion, let 
x be in A®. Then A is a neighborhood of x, and there exists an open subset U of 
A containing x. So x is contained in the union of all open sets contained in A. 


Corollary 2.7. A subset A of X is open if and only if A = A®. 


A subset F of X is closed if its complement is open. Every closed interval of 
the real line is closed. A half-open interval [a, b) on the real line is neither open 
nor closed if a and b are both finite. 


‘Some authors use the term “neighborhood” to mean what is here called “open neighborhood.” 
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Proposition 2.8. The closed sets of X have the properties that 


(a) X and the empty set © are closed, 
(b) an arbitrary intersection of closed sets is closed, 
(c) any finite union of closed sets is closed. 


PRooF. This result follows from Proposition 2.5 by taking complements. In 
(a), the complements of X and @ are @ and X , respectively. For (b) and (c), we use 
the formulas ((), Fa)’ = U, Fé and (U, Fu)® = My Fé for the complements 
of intersections and unions. 


If A is a subset of X, then x in X is a limit point of X if each neighborhood 
of x contains a point of A distinct from x. The closure? A“ of A is the union of 
A with the set of all limit points of A. For example, the limit points of the set 
[a, b) U {b + 1} on the real line are the points of the closed interval [a, b], and 
the closure of the set is [a, b] U {b + 1}. 


Proposition 2.9. A subset A of X is closed if and only if it contains all its 
limit points. 


PROOF. Suppose A is closed, so that A° is open. If x is in A‘, then A® is 
an open neighborhood of x disjoint from A, so that x cannot be a limit point of 
A. Thus all limit points of A lie in A. In the reverse direction suppose that A 
contains all its limit points. If x is in A°, then x is not a limit point of A, and 
hence there exists an open neighborhood of x lying completely in A‘. Since x is 
arbitrary, A° is open, and thus A is closed. 


Proposition 2.10. The closure A‘! of a subset A of X is closed. The closure 
of A is the intersection of all closed sets containing A; that is, it is the smallest 
closed set containing A. 


PROOF. We shall apply Proposition 2.9. If x is given as a limit point of A“, 
we are to see that x is in A®!. Assume the contrary. Then x is not in A, and x 
is not a limit point of A. Because of the latter condition, there exists an open 
neighborhood U of x that does not meet A except possibly in x. Because of the 
former condition, U does not meet A at all. Since x is a limit point of Ac. U 
contains a point y of A‘!. Since U does not meet A, y has to be a limit point of 
A. Since U is an open neighborhood of y, U has to contain a point of A, and 
we have a contradiction. We conclude that x is in A“, and Proposition 2.9 shows 
that A“ is closed. 

Any closed set F containing A contains all its limit points, by Proposition 2.9, 
and hence contains all the limit points of A. Thus F > A“. Since A‘ itself is a 
closed set containing A, it follows that A“ is the smallest closed set containing A. 


2Some authors write A instead of A“! for the closure of A. 
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Corollary 2.11. A subset A of X is closed if and only if A = A‘!. Consequently 
(A‘!)‘! = A‘! for any subset A of X. 


Two remarks are in order. The first remark is that the proofs of all the results 
from Proposition 2.6 through Corollary 2.11 use only that the family of open 
subsets of X satisfies properties (a), (b), and (c) in Proposition 2.5 and do not 
actually depend on the precise definition of “open set.” This observation will be 
of importance to us in Chapter X, when properties (a), (b), and (c) will be taken 
as an axiomatic definition of a “topology” of open sets for X, and then all the 
results from Proposition 2.6 through Corollary 2.11 will still be valid. 

The second remark is that the mathematics of pseudometric spaces can always 
be reduced to the mathematics of metric spaces, and we shall normally therefore 
work only with metric spaces. The device for this reduction is given in the 
next proposition, which uses the notion of an equivalence relation. Equivalence 
relations are taken as known but are reviewed in Section A6 of the appendix. 


Proposition 2.12. Let (X, d) be a pseudometric space. If members x and y of 
X are called equivalent whenever d(x, y) = 0, then the result is an equivalence 
relation. Denote by [x] the equivalence class of x and by Xo the set of all 
equivalence classes. The definition do([x], [v]) = d(x, y) consistently defines a 
function dp : Xo x Xo — R, and (Xo, do) is a metric space. A subset A is open in 
X if and only if two conditions are satisfied: A is a union of equivalence classes, 
and the set Ao of such classes is an open subset of Xo. 


PROOF. The reflexive, symmetric, and transitive properties of the relation 
“equivalent” are immediate from the defining properties of a metric. Let x and 
x’ be equivalent, and let y and y’ be equivalent. Then 


d(x, y) <d(x,x) +d’, y) +d’, y) =04+ d(x’, y) +0 =d(x', y), 


and similarly 
d(x’, y’) < d(x, y). 


Thus d(x, y) = d(x’, y’), and do is well defined. The properties showing that dp 
is a metric are immediate from the corresponding properties for d. 

Next let x be in an open set A, and let x’ be equivalent to x. Since A is open, 
some open ball B(r; x) is contained in A. Since x’ has d(x, x’) = 0, x’ lies in 
B(r; x). Thus x’ lies in A, and A is the union of equivalence classes. 

Finally let A be any union of equivalence classes, and let Ao be the set of those 
classes. If x is in A, then the set of points in some equivalence class lying in 
Br; [x]) is just B(r; x), and it follows that A is open in X if and only if Ao is 
open in Xo. 
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Before we discuss continuous functions between metric spaces, let us take note 
of some properties of inverse images for abstract functions as listed in Section Al 
of the appendix. If f : X — Y is a function between two sets X and Y and 
E is a subset of Y, we denote by f'(E) the inverse image of EF under f,i.., 
{x € X | f(x) € E}. The properties are that inverse images of functions respect 
unions, intersections, and complements. 

Let (X, d) and (Y, e) be metric spaces. A function f : X — Y is continuous 
at a point x ¢ X if foreach e > 0, there is ad > 0 such that p( f(x), f(y)) <€ 
whenever d(x, y) < 6. This definition is consistent with the definition when 
(X, d) and (Y, e) are both equal to R with the usual metric. 


Proposition 2.13. If (X,d) and (Y, ¢) are metric spaces, then a function 
f : X — Y is continuous at the point x € X if and only if for any open 
neighborhood V of f (x) in Y, there is aneighborhood U of x suchthat f(U) C V. 


PROOF. Let f be continuous at x and let V be given. Choose € > 0 such that 
Bie; f(x)) is contained in V, and choose 5 > 0 such that p(f (x), f(y)) < € 
whenever d(x, y) < 6. Then y € B(é; x) implies f(y) € Ble; f(x)) C V. 
Thus U = B(é; x) has f(U) CV. 

Conversely suppose that f satisfies the condition in the statement of the 
proposition. Let « > 0 be given, and choose a neighborhood U of x such 
that f(U) C Bie; f(x)). Since U is a neighborhood of x, we can find an 
open ball B(é; x) lying in U. Then f(B(6;x)) C Ble; f(x)), and hence 
e(f (x), fQ)) < € whenever d(x, y) < 6. 


Corollary 2.14. Let f : X — Y andg: Y —> Z be functions between metric 
spaces. If f is continuous at x and g is continuous at f (x), then the composition 


go f, given by (go f)(v) = g(f()), is continuous at x. 


PROOF. Let W be an open neighborhood of g(f(x)). By continuity of g at 
J (x), we can choose a neighborhood V of f(x) such that g(V) C W. Possibly 
by passing to a subset of V, we may assume that V is an open neighborhood of 
J (x). By continuity of f at x, we can choose a neighborhood U of x such that 
f(U) CV. Then g(f(U)) € W. Taking Proposition 2.13 into account, we see 
that g o f is continuous at x. 


Proposition 2.15. If (X, d) and (Y, pe) are metric spaces and f is a function 
from X into Y, then the following are equivalent: 
(a) the function f is continuous at every point of X, 
(b) the inverse image under f of every open set in Y is open in X, 
(c) the inverse image under f of every closed set in Y is closed in X. 
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PROOF. Suppose (a) holds. If V is open in Y and x is in f'(V), then f (x) is 
in V. Since f is continuous at x by (a), Proposition 2.13 gives us a neighborhood 
U of x, which we may take to be open, such that f(U) C V. Then we have 
x €U C f7'(V). Since x is arbitrary in f~!(V), f~!(V) is open. Thus (b) 
holds. In the reverse direction, suppose (b) holds. Let x in X be given, and let V 
be an open neighborhood of f (x). By (b), U = f~!(V) is open, and U is then 
an open neighborhood of x mapping into V. This proves (a), and thus (a) and (b) 
are equivalent. Conditions (b) and (c) are equivalent, since f~!(V)° = f7!(V°). 


A function f : X — Y that is continuous at every point of X, as in Proposition 
2.15, will simply be said to be continuous. A function f : X — Y is a homeo- 
morphism if f is continuous, if f is one-one and onto, and if f~! : Y > X 
is continuous. The relation “is homeomorphic to” is an equivalence relation. 
Namely, the identity function shows that the relation is reflexive, the symmetry of 
the relation is built into the definition, and the transitivity follows from Corollary 
2.14. 

If (X, d) is a metric space and if A is anonempty subset of X ,, then the distance 
from x to A, denoted by D(x, A), is defined by 


D(x, A) = inf d(x, y). 
yeA 


Proposition 2.16. Let A be a fixed nonempty subset of a metric space (X, d). 
Then the real-valued function f defined on X by f(x) = D(, A) is continuous. 


PROOF. If x and y are in X and z is in A, then the triangle inequality gives 
D(x, A) < d(x, z) < d(x, y)+ d(y, z). 


Taking the infimum over z gives D(x, A) < d(x, y) + DQ, A). Reversing the 
roles of x and y, we obtain D(y, A) < d(x, y) + D(x, A), since d(y,x) = 
d(x, y). Therefore 


lf) — Ff) = |DG@, A) — DY, A) < d@, y). 


Fix x, let € > 0 be given, and take 6 = €. If d(x, y) < 6 = €, then our inequality 
gives us | f(x) — f(y)| < €. Hence f is continuous at x. Since x is arbitrary, f 
is continuous. 


Corollary 2.17. If (X,d) is a metric space, then the real-valued function 
d(x, y) for fixed x is continuous in y. 


PROOF. This is the special case of the proposition in which A is the set {y}. 
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Corollary 2.18. Let (X, d) be a metric space, and let x be in X. Then the 
closed ball {y € X | d(x, y) <r} is acclosed set. 


REMARK. Nevertheless, the closed ball is not necessarily the closure of the 
open ball Bir; x) = {y € X | d(x, y) < r}. A counterexample is provided by 
any open ball of radius 1 in a space with the discrete metric. 


ProoF. If f(y) = d(, y), the set in question is f7'((0,r]). Corollary 2.17 
says that f is continuous, and the equivalence of (a) and (c) in Proposition 2.15 
shows that the set in question is closed. 


Proposition 2.19. If A is a nonempty subset of a metric space (X, d), then 
At = {x | D(x, A) = O}. 


PROOF. The set {x | D(x, A) = 0} is closed by Propositions 2.16 and 2.15, 
and it contains A. By Proposition 2.10 it contains A‘!. For the reverse inclusion, 
suppose x is not in A“, hence that x is not in A and x is not a limit point of A. 
These conditions imply that there is some € > 0 such that B(e; x) is disjoint 
from A, hence that d(x, y) > ¢ for all y in A. Taking the infimum over y gives 
D(x, A) > € > 0. Hence D(x, A) £0. 


4. Sequences and Convergence 


For a set S, we have already defined in Section I.1 the notion of a sequence in S 
as a function from a certain kind of subset of integers into S. In this section we 
work with sequences in metric spaces. 

A sequence {x,} in a metric space (X, d) is eventually in a subset A of X if 
there is an integer N such that x, is in A whenever n > N. The sequence {x,} 
converges to a point x in X if the sequence is eventually in each neighborhood 
of x. It is apparent that if {x,,} converges to x, then so does every subsequence 


{Xn,}. 


Proposition 2.20. If (X,d) is a metric space, then no sequence in X can 
converge to more than one point. 


PROOF. Suppose on the contrary that {x,} converges to distinct points x and 
y. The number m = d(x, y) is then > 0. By the assumed convergence, x, lies 
in both open balls BS; x) and BF; y) ifn is large enough. Thus x, lies in the 
intersection of these balls. But this intersection is empty, since the presence of a 
point z in both balls would mean that d(x, y) < d(x,z)+d(z,y)< +5 =m, 
contradiction. 
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If a sequence {x,} in a metric space (X, d) converges to x, we shall call x the 
limit of the sequence and write lim, X, = x or lim, x, = x or limx, = x or 
X, — x. A sequence has at most one limit, by Proposition 2.20. If the definition 
of convergence is extended to pseudometric spaces, then sequences need not have 
unique limits. 

Let us identify convergent sequences in some of the examples of metric spaces 
in Section 1. 


EXAMPLES OF CONVERGENCE IN METRIC SPACES. 


(0) The real line. On R with the usual metric, the convergent sequences are 
the sequences convergent in the usual sense of Section I.1. 


(1) Euclidean space R”. Here the metric is given by 


ace,» = (oee- yw?) 
k=1 


if x = (x,...,X,) and y = (y1,..., ¥,). Another metric d’(x, y) is given by 


d'(x, y) = max |xz — yxl, 
1<k<n 


and we readily check that 


d'(x,y) < d(x, y) < Jnd'(x, y). 


From this inequality it follows that the convergent sequences in (IR”, d) are the 
same as the convergent sequences in (IR”, d’). On the other hand, the definition 
of d’ as a maximum means that we have convergence in (R”, d’) if and only if 
we have ordinary convergence in each entry. Thus convergence of a sequence of 
vectors in (R”, d) means convergence in the kth entry for all k with 1 <k <n. 


(2) Complex Euclidean space C”. As a metric space, C” gets identified with 
IR". Thus a sequence of vectors in C” converges if and only if it converges entry 
by entry. 

(3) Extended real line R*. Here the metric is given by d(x, y) = |f(x)— f(y)| 
with f(x) = x/(. + |x|) if x isin R, f(—oo) = —1, and f(+oo) = +1. We 
saw in Section | that the intersections with R of the open balls of R* are the 
open intervals in R. Thus convergence of a sequence in R* to a point x in R 
means that the sequence is eventually in (—0o, +00) and thereafter is an ordinary 
convergent sequence in R. Convergence to +00 of a sequence {x,} means that for 
each real number M, there is an integer N such that x, > M whenevern > N. 
Convergence to —oo is analogous. 
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(4) Bounded scalar-valued functions on S in the uniform metric. A sequence 
{ f,} in B(S) converges in the uniform metric on B(S) if and only if { f,} converges 
uniformly, in the sense below, to some member f of B(S). The definition of 
uniform convergence here is the natural generalization of the one in Section I.3: 
{ fn} converges to f uniformly if for each € > 0, there is an integer N such that 
n > N implies | f,(s) — f(s)| < € for all s simultaneously. An important fact 
in this case is that the sequence { f,,} is uniformly bounded, i.c., that there exists 
a real number M such that | f,(s)| < M for all n and s. In fact, choose some 
integer N for € = 1. Then the triangle inequality gives 


If) < | inl) — fr +l fv Ol < 1+ 1 fv) 


for all s ifn > N,so that M can be taken to be maxj<n<y { sup, es | fn (s)|} +1. 


(5) Bounded functions from S into a metric space (R, p). Convergence here 
is the expected generalization of uniform convergence: {/;,} converges to f 
uniformly if for each € > O, there is an integer N such that n > N implies 
P(fnls), f(s)) < € for all s simultaneously. As in Example 4, a uniformly 
convergent sequence of bounded functions is uniformly bounded in the sense 
that p(fn(s), 70) < M for all ands, M being some real number. Here ro is any 
fixed member of R. 


(7) Indiscrete space X. The function d(x, y) in this case is a pseudometric, not 
a metric, unless X has only one point. Every sequence in X converges to every 
point in X. 

(8) Discrete metric. Convergence of a sequence {x,} in a space X with the 
discrete metric means that {x,,} is eventually constant. 


(11) Hilbert cube. For each n, let ({Xm}?°_))n be a member of the Hilbert cube, 
and write Xm, for the m™ term of the n sequence. As n varies, the sequence of 


sequences converges if and only if lim, x exists for each m. 


(12) L! metric on Riemann integrable functions. The function d(f, g) defined 
in this case is a pseudometric, not a metric. Convergence in the corresponding 
metric space as in Proposition 2.12 therefore really means a certain kind of con- 
vergence of equivalence classes: If {f,} and f are given, the sequence of classes 
{Lf,]} converges to the class [f] if and only if lim, ie lfn(x) — f(x)| dx = 0. 
The use of classes in the notation is rather cumbersome and not very helpful, and 
consequently it is common practice to treat the L' space as a metric space and to 
work with its members as if they were functions rather than equivalence classes. 
We return to this point in Chapter V. 


Let us elaborate a little on Examples 4 and 5, concerning the space B(S) 
of bounded scalar-valued functions on a set S$ or, more generally, the space 
of bounded functions from S$ into a metric space (R, 9). Suppose that S has 
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the additional structure of a metric space (S,d). We let C(S) be the subset of 
B(S) consisting of bounded continuous functions on S, and we write C(S, R) or 
C(S, C) if we want to be explicit about the range. More generally we consider 
the space of bounded continuous functions from S$ into the metric space R. All 
of these are metric spaces in their own right. 


Proposition 2.21. Let (S, d) and (R, e) be metric spaces, let x9 be in S, and 
let f, : S > R be a sequence of bounded functions from S into R that converge 
uniformly to f : S — R and are continuous at x9. Then f is continuous at xo. 
In particular, the uniform limit of continuous functions is continuous. 


PROOF. For x in S, we write 


P(F(X), f(%0)) S PCF), fn @)) + (fn), fn(X0)) + PC fn 0), f Xo). 


Given € > 0, we choose an integer N by the uniform convergence such that the 
first and third terms on the right side are < € forn > N. With N fixed, we choose 
5 > 0 by the continuity of fy at xo such that o(fy(x), fu(xo)) < € whenever 
d(x,xo) < 6. Then the displayed inequality shows that d(x, xo) < 6 implies 
pP(f (x), f (%0)) < 3€, and the proposition follows. 


We conclude this section with some elementary results involving convergence 
of sequences in metric spaces. 


Proposition 2.22. If (X, d) is a metric space, then 


(a) for any subset A of X and limit point x of A, there exists a sequence in 
A — {x} converging to x, 

(b) any convergent sequence in X with limit x € X either has infinite image, 
with x as a limit point of the image, or else is eventually constantly equal 
to x. 


REMARK. This result and the first corollary below are used frequently — and 
often without specific reference. 


PROOF OF (a). Foreachn > 1,the open ball B(1/n; x) is an open neighborhood 
of x and must contain a point x, of A distinct from the limit point x. Then 
d(xn, x) < 1/n, and thus lim x, = x. Hence {x,,} is the required sequence. 

PROOF OF (b). Suppose that {x,} converges to x and has infinite image. By 
discarding the terms equal to x, we obtain a subsequence {x,,} with limit x. If 
U is an open neighborhood of x, then {x,,} is eventually in U, by the assumed 
convergence. Since no term of the subsequence equals x, U contains a member 
of the image of {x,} different from x. Thus x is a limit point of the image of {xp}. 
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Now suppose that {x,} converges to x and has finite image {p),..., p,}. If 
X, is equal to some particular p,, for infinitely many n, then {x,} has an infinite 
subsequence converging to pj. Since {x,} converges to x, every convergent 
subsequence converges to x. Therefore p;, = x. For j # jo, only finitely many 
X, can then equal p;, and it follows that {x,} is eventually constantly equal to 


Pjo = %- 


Corollary 2.23. If (X, d) is a metric space, then a subset F of X is closed if 
and only if every convergent sequence in F has its limit in F’. 


PROOF. Suppose that F is closed and {x,} is a convergent sequence in F with 
limit x. By Proposition 2.22b, either x is in the image of the sequence or x is 
a limit point of the sequence. In either case, x is in F; thus the limit of any 
convergent sequence in F is in F. 

Conversely suppose every convergent sequence in F has its limit in F’. If x 
is a limit point of F’, then Proposition 2.22a produces a sequence in F — {x} 
converging to x. By assumption, the limit x is in F. Therefore F contains all its 
limit points and is closed. 


Corollary 2.24. If (S,d) is a metric space, then the set C(S) of bounded 
continuous scalar-valued functions on S is a closed subset of the metric space 
B(S) of all bounded scalar-valued functions on S. 


PROOF. Proposition 2.21 shows for any sequence in C(S) convergent in B(S) 
that the limit is actually in C(S). By Corollary 2.23, C(S) is closed in B(S). 


Proposition 2.25. Let f : X — Y bea function between metric spaces. Then 
f is continuous at a point x in X if and only if whenever {x,} is a convergent 
sequence in X with limit x, then { f (x,)} is convergent in Y with limit f (x). 


REMARK. In the special case of domain and range R, this result was mentioned 
in Section I.1 after the definition of continuity. We deferred the proof of the special 
case until now to avoid repetition. 


PROOF. Suppose that f is continuous at x and that {x,,} is a convergent sequence 
in X with limit x. Let V be any open neighborhood of f(x). By continuity, there 
exists an open neighborhood U of x such that f(U) C V. Since x, — x, there 
exists N such that x, is in U whenevern > N. Then f(x,) isin f(U) C V 
whenever n > N. Hence { f (x,)} converges to f(x). 

Conversely suppose that x, — x always implies f(x,) > f(x). We are to 
show that f is continuous. Let V be an open neighborhood of f(x). We are to 
show that some open neighborhood of x maps into V under f. Assuming the 
contrary, we can find, for each n > 1, some x, in B(1/n; x) such that f(x,) is 
not in V. Then x, — x, but the distance of f(x,) from f(x) is bounded away 
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from 0. Thus f(x,) cannot converge to f(x). This is a contradiction, and we 
conclude that some B(1/n; x) maps into V under f; since V is arbitrary, f is 
continuous. 


5. Subspaces and Products 


When working with functions on the real line, one frequently has to address 
situations in which the domain of the function is just an open interval or a closed 
interval, rather than the whole line. When one uses the €-6 definition of continuity, 
the subject does not become much more cumbersome, but it can become more 
cumbersome if one uses some other definition, such as one involving limits. The 
theory of metric spaces has a device for addressing smaller domains than the 
whole space—the notion of a subspace—and then the theory of functions on a 
subspace stands on an equal footing with the theory of functions on the whole 
space. 

Let (X, d) be a metric space, and let A be a nonempty subset of X. There is 
a natural way of making A into a metric space, namely by taking the restriction 
d | Axa aS a metric for A. When we do so, we speak of A as a subspace of X. 
When there is a need to be more specific, we may say that A is a metric subspace 
of X. If A is an open subset of X, we may say that A is an open subspace; if A 
is a closed subset of X, we may say that A is a closed subspace. 


Proposition 2.26. If A is a subspace of a metric space (X, d), then the open 
sets of A are exactly all sets U M A, where U is open in X, and the closed sets of 
A are all sets F 1 A, where F is closed in X. 


PROOF. The open balls in A are the intersections with A of the open balls of 
X, and the statement about open sets follows by taking unions. The closed sets 
of A are the complements within A of all the open sets of A, thus all sets of the 
form A— (UMA) with U open in X. Since A— (UNA) = ANU‘, the statement 
about closed sets follows. 


Corollary 2.27. If A is a subspace of (X, d) andif f : X — Y is continuous 
at a point a of A, then the restriction | ,, mapping A into Y, is continuous at a. 
Also, f is continuous at a if and only if the function fp : X — f(X) obtained 
by redefining the range to be the image is continuous at a. 


PROOF. Let V be an open neighborhood of f(a) in Y. By continuity of f ata 
as a function on X, choose an open neighborhood U of a in X with f(U) CV. 
Then U / A is an open neighborhood of a in A, and f(UN A) C V. Hence fla 
is continuous at a. 
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The most general open neighborhood of f (a) in f (X) is of the form VN f (X) 
with V an open neighborhood of f(a) in Y. Since f~!(V) = 0 N f(X)), 
the condition for continuity of fo at a is the same as the condition for continuity 
of f ata. 


We now turn our attention to product spaces. Product spaces are a convenient 
device for considering functions of several variables. 

If (X, d) and (Y, d’) are metric spaces, there are several natural ways of making 
the product set X x Y, the set of ordered pairs with the first member from X and 
the second from Y, into a metric space, but all such ways lead to the same class 
of open sets and therefore also the same class of convergent sequences. We 
discussed an instance of this phenomenon in Example 1 of Section 4. For general 
X and Y, three such metrics on X x Y are 


pr((X1, Y1)s (x2, ¥2)) = d(a1, x2) + d’(y1, y2), 
p2((x1, ¥1)s 2, ¥2)) = (dQ, 2)? +d’, yo)”, 
Poo((X1, y1)s (2, y2)) = max{d(x1, x2), d'(y1, y2)}- 


Each satisfies the defining properties of a metric. Simple algebra gives 
max{a, b} < (a? +b’)? <a+b < 2max{a, b} 
whenever a and b are nonnegative reals, and therefore 


Poo S P2 < P1 < 2Poo. 


Let us check that this chain of inequalities implies that the neighborhoods of 
a point (xo, yo) are the same in all three metrics, hence that the open sets are the 
same in all three metrics. For anyr > 0, the open balls about (xo, yo) in the three 
metrics satisfy 


Bir; (Xo, yo)) © Bo(r; (Xo, yo)) © Boo(r; (Xo, yo)) © Bi (2r; (x0, yo)). 


The first and second inclusions show that open balls about (xo, yo) in the metrics 
£2 and Poo are neighborhoods of (xo, yo) in the metric p,. Similarly the second and 
third inclusions show that open balls in the metrics p., and p; are neighborhoods 
in the metric 2, and the third and first inclusions show that open balls in the 
metrics ¢; and (2 are neighborhoods in the metric pg. 

We shall refer to the metric p., as the product metric for X x Y. If X x Y is 
being regarded as a metric space and no metric has been mentioned, fg is to be 
understood. But it is worth keeping in mind that ; and p2 yield the same open 
sets. In the case of Euclidean space, it is the metric o2 on R” x R” that gives the 


104 I. Metric Spaces 


Euclidean metric on R”*"; thus the product metric and the Euclidean metric are 
distinct but yield the same open sets. 

A sequence {(x,, y,)} in the product metric converges to (Xo, yo) in X x Y if 
and only if {x,} converges to xo and {y,} converges to yo. Since the three metrics 
on X x Y yield the same convergent sequences, this statement is valid in the 
metrics p; and (2 as well. 

It is an elementary property of the arithmetic operations in R that if {x,} 
converges to xo and {y,} converges to yo, then {x, + y,} converges to x9 + yo. 
Similar statements apply to subtraction, multiplication, maximum, and minimum, 
and then to absolute value and to division except where division by 0 is involved. 
Further similar statements apply to those operations on vectors that make sense. 
Applying Proposition 2.25, we obtain (a) through (e) in the following proposition. 
Conclusions (a’) through (e’) are proved similarly. 


Proposition 2.28. The following operations are continuous: 


(a) addition and subtraction from R” x R” into R”, 

(b) scalar multiplication from R x R” into R”, 

(c) the map x +> x7! from R — {0} to R — {0}, 

(d) the map x +> |x| from R” to R, 

(e) the operations from R? to R of taking the maximum of two real numbers 
and taking the minimum of two real numbers, 

(a’) addition and subtraction from C” x C” into C”, 

(b’) scalar multiplication from C x C” into C”, 

(c’) the map x +> x7! from C — {0} to C — {0}, 

(d’) the map x +> |x| from C” to R, 

(e’) the map x +> x from C to C. 


Corollary 2.29. Let (X, d) be a metric space, and let f and g be continuous 
functions from X into R” or C”. Ifc is a scalar, then f+g,cf, f —g,and|f| are 
continuous. If n = 1, then the product fg is continuous, and the function 1/f is 
continuous on the set where f is not zero. If n = | and the functions take values 
in R, then max{f, g} and min{ f, g} are continuous. If n = 1 and the functions 
take values in C, then the complex conjugate f is continuous. 


REMARKS. If (S,d) is a metric space, then it follows that the metric space 
C(S) of bounded continuous scalar-valued functions on S is a vector space. As 
such, it is a vector subspace of the metric space B(S) of bounded scalar-valued 
functions on S, and it is a metric subspace as well.* 


3The word “subspace” can now be used in two senses, that of a metric subspace of a metric space 
and that of a vector subspace of a vector space. The latter kind of subspace we shall always refer to 
as a “vector subspace,” retaining the word “vector” for clarity. A “closed vector subspace” of B(S) 
then has to mean a closed metric subspace that is also a vector subspace. 
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PROOF. The argument for f + g and for functions with values in R” will illus- 
trate matters sufficiently. We set up x +> f(x) + g(x) as a suitable composition, 
expressing the composition in a diagram: 


XH (x,x) (x,y) (f(x), 8(9)) (u,v)reut+v 
arin. 


xX —— XxX R” x R’ R". 
Each function in the diagram is continuous, the last of them by Proposition 2.28a, 
and then the composition is continuous by Corollary 2.14. 


We conclude this section with one further remark. When (X, d) is a metric 
space, we saw in Corollary 2.17 thatx +> d(x, y) andy +> d(x, y) arecontinuous 
functions from X to R. Actually, (x, y) t» d(x, y) is a continuous function 
from X x X into R if we use the product metric. In fact, if oo. denotes the 
product metric with Pol@, y), (xo, yo)) = max{d(x, xo), d(y, yo)}, then we 
have d(x, y) < d(x, x9) + d(xo, yo) + d(yo, y) and therefore 


d(x, y) — d(xo, yo) < d(x, x0) + dy, Yo). 
Reversing the roles of (x, y) and (xo, yo), we see that 
ld(x, y) — d(Xo, yo)| < d(x, x0) + d(y, yo) 
< 2max{d(x, x0), d(y, yo)} 


= 2Poo((x, y), (xo, yo)). 
From this chain of inequalities, it follows that d is continuous with 6 = €/2. 


6. Properties of Metric Spaces 


This section contains two results about metric spaces. One lists a number of 
“separation properties” of sets within any metric space. The other concerns the 
completely different property of “separability,” which is satisfied by some metric 
spaces and not by others, and it says that separability may be defined in any of 
three equivalent ways. 


Proposition 2.30 (separation properties). Let (X, d) be a metric space. Then 
(a) every one-point subset of X is a closed set, i.e., X is T,, 
(b) for any two distinct points x and y of X, there are disjoint open sets U 
and V with x € U and y € V,ie., X is Hausdorff, 
(c) for any point x € X and any closed set F C X with x ¢ F, there are 
disjoint open sets U and V withx € U and F C V,ie., X is regular, 
(d) for any two disjoint closed subsets F and F of X, there are disjoint open 
sets U and V such that E C U and F C V,ie., X is normal, 
(e) for any two disjoint closed subsets E and F of X, there is a continuous 
function f : X — [0, 1] such that f is 0 exactly on E and f is 1 exactly 
on F. 
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PROOF. For (a), the set {x} is the intersection of all closed balls B(r; x) for 
r > 0 and hence is closed by Corollary 2.18 and Proposition 2.8b. For (e), the 
function f(x) = D(x; E)/(D(@; E) + D(x; F)) is continuous by Proposition 
2.16 and Corollary 2.29 and takes on the values 0 and 1 exactly on E and F, 
respectively, by Proposition 2.19. 

For (d), we need only apply (e) and Proposition 2.15b with U = f~! ((—00, 5)) 
and V = f—! (G, +00)). Conclusions (a) and (d) imply (c), and conclusions (a) 
and (c) imply (b). This completes the proof. 


A base 6 for a metric space (X, d) is a family of open sets such that every 
open set is a union of members of 6. The family of all open balls is an example 
of a base. 


Proposition 2.31. If (X, d) is a metric space, then a family B of subsets of X 
is a base for (X, d) if and only if 
(a) every member of B is open and 
(b) for each x € X and open neighborhood U of x, there is some member B 
of B such that x is in B and B is contained in U. 


PROOF. If & is a base, then (a) holds by definition of base. If U is open in X, 
then U = LU, Ba for some members B, of B, and any such By, containing x can 
be taken as the set B in (b). 

Conversely suppose that B satisfies (a) and (b). By (a), each member of B is 
open in X. If U is open in X, we are to show that U is a union of members of B. 
For each x € U, choose some set B = B, as in (b). Then U = L),.,, By, and 
hence each open set in X is a union of members of 8. Thus B is a base. 


This book uses the word countable to mean finite or countably infinite. It is 
then meaningful to ask whether a particular metric space (X, d) has a countable 
base. On the real line R, the open intervals with rational endpoints form a 
countable base. 

A subset D of X is dense in a subset A of X if D°! D A; D is dense, or 
everywhere dense, if D is dense in X. A set D is dense if and only if there is 
some point of D in each nonempty open set of X. 

A family U/ of open sets is an open cover of X if the union of the sets in U/ is 
X. An open subcover of // is a subfamily of / that is itself an open cover. 


Proposition 2.32. The following three conditions are equivalent for a metric 
space (X, d): 
(a) X has a countable base, 
(b) every open cover of X has a countable open subcover, 
(c) X has a countable dense subset. 
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ProoF. If (a) holds, let B = {B,},>1 be a countable base, and let U/ be an 
over cover of X. Any U € U is the union of the B, € Bwith B, C U. If By = 
{B, ¢ B| By © U for some U € U}, then it follows that U, -B, = Uvey 
X. For each B, in Bo, select some U, in U with B, C U,. Then U,, Un 
Ly B,<By = X,and {U,,} is a countable open subcover of U/. Thus (b) holds. 

If (b) holds, form, for each fixed n > 1, the open cover of X consisting of 
all open balls B(1/n; x). For that n, let {B(1/1; Xinn)}m>1 be a countable open 
subcover. We shall prove that the set D of all x,,,, with m and n arbitrary, is dense 
in X. It is enough to prove that each nonempty open set in X contains a member 
of D, hence to prove, for each n, that each open ball of radius 1/n contains a 
member of D. Thus consider B(1/n; x). Since the open balls B(1/n; Xmn) with 
m > 1 cover X,x is insome B(1/n; Xn). Then that x, has d(Xmn,x) < 1/n, 
and hence xX, is in B(1/n; x). Thus D is dense, and (c) holds. 

If (c) holds, let {x,,},>1 be acountable dense set. Form the collection of all open 
balls centered at some x, and having rational radius. Let us use Proposition 2.31 
to see that this collection of open sets, which is certainly countable, is a base. Let 
U be an open neighborhood of x. We are to see that there is some member B of our 
collection such that x is in B and B is contained in U. Since U is a neighborhood 
of x, we can find an open ball B(r; x) such that B(r; x) C U; we may assume 
that r is rational. The given set {x,},>, being dense, some x, lies in B(r/2; x). 
If y is in B(r/2; x,), then d(x, y) < d(x, Xn) +dQn,y) < 5 +5 =r. Hence 
x lies in B(r/2; x,) and B(r/2; x,) C Bir; x) C U. Since r/2 is rational, the 
open ball B(r/2; x,) is in our countable collection, and our countable collection 
is a base. This proves (a). 


IU Il 


A metric space satisfying the equivalent conditions of Proposition 2.32 is 
said to be separable. Among the examples of metric spaces in Section 1, the 
ones in Examples 1, 2, 3, 6, 8 if X is countable, 9, 11, 12, 13, and 14 are 
separable. A countable dense set in Examples 1, 2, and 3 is given by all points 
with all coordinates rational. In Example 6, one countable dense set consists of 
all sequences with only finitely many nonzero entries, those being rational, and in 
Examples 8 and 9, X itself is a countable dense set. In Example 11, the sequences 
that are O in all but finitely many entries, those being rational, form a countable 
dense set. In Example 13, the set of finite linear combinations of exponentials 
e'"* using scalars in Q + iQ is dense as a consequence of Parseval’s equality. In 
Example 12, when [a, b] = [—z, 1], the same countable set as for Example 13 
is dense by Proposition 2.25 because the sets of functions in Examples 12 and 13 
coincide and the inclusion of L* into L! is continuous. In Example 14, the set of 
polynomials with coefficients in Q+iQ is countable and can be shown to be dense. 

Example 10 is not separable, and Example 8 is not separable if X is uncount- 
able. 
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In Section 6 we introduced the notions of open cover and subcover for a metric 
space. We call a metric space compact if every open cover of the space has a 
finite subcover. A subset E of a metric space (X, d) is compact if it is compact 
as a subspace of the whole space, i.e., if every collection of open sets in X whose 
union contains E has a finite subcollection whose union contains FE. 

Historically this notion was embodied in the Heine—Borel Theorem, which 
says that any closed bounded subset of Euclidean space has the property that 
has just been defined to be compactness. As we shall see in Theorem 2.36 and 
Corollary 2.37 below, the Heine—Borel Theorem can be proved from the Bolzano-— 
Weierstrass Theorem (Theorem 1.8) and leads to faster, more transparent proofs of 
some of the consequences of the Bolzano—Weierstrass Theorem. Even more im- 
portant is that it generalizes beyond metric spaces and produces useful conclusions 
about certain spaces of functions when statements about pointwise convergence 
of a sequence of functions are inadequate. 

Easily established examples of compact sets are hard to come by. For one 
example, consider in a metric space (X,d) a convergent sequence {x,} along 
with its limit x. The subset E = {x} UU, {xn} of X is compact. In fact, if / is 
an open cover of E, some member U of U/ has x as an element, and then all but 
finitely many elements of the sequence must be in U as well. Say that U contains 
x and all x, withn > N. For 1 <n < N, let U, be a member of U/ containing 
X,. Then {U, Uj,..., Uy_ 1} is a finite subcover of U/. 

It is easier to exhibit noncompact sets. The open interval (0, 1) is not compact, 
as is seen from the open cover {(, 1)}. Nor is an infinite discrete space, since 
one-point sets form an open cover. A subtle dramatic example is the closed unit 
ball C of the hedgehog space X , Example 10 in Section 1; this set is not compact. 
In fact, the open ball of radius 1/2 about the origin is an open set in X, and so 
is each open ray from the origin out to infinity. Let U/ be this collection of open 
sets. Then U/ is an open cover of C. However, no member of U/ is superfluous, 
since for each U in U, there is some point x in C such that x is in C but x is in no 
other member of 7/. Thus U/ does not contain even a countable subcover. 

Let us now work directly toward a proof of the equivalence of compactness 
and the Bolzano—Weierstrass property in a metric space. 


Proposition 2.33. A compact metric space is separable. 


PROOF. This is immediate from equivalent condition (b) for the definition of 
separability in Proposition 2.32. 
Proposition 2.34. In any metric space (X, d), 


(a) every compact subset is closed and bounded and 
(b) any closed subset of a compact set is compact. 
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PROOF. For (a), let E be a compact subset of X, fix xo in X, and let U,, for 
n > 1 be the open ball {x € X | d(x%p, x) < n}. Then {U,,} is an open cover of E. 
Since the U,,’s are nested, the compactness of E implies that E is contained in a 
single Uy for some N. Then every member of E is at distance at most N from 
xo, and E is bounded. 

To see that E is closed, we argue by contradiction. Let x9 be a limit point of E 
that is not in E. By the Hausdorff property (Proposition 2.30b), we can find, for 
each x € E, open sets U, and V, with x € Ux, x € Vy, and U, 1 V; = ©. The 
sets U, form an open cover of E. By compactness let {U,,,..., Ux, } be a finite 
subcover. Then E C U,, U---UU,,, which is disjoint from the neighborhood 
Vy, A+++ Vy, of x6. Thus x cannot be a limit point of E, and we have arrived 
at a contradiction. This proves (a). 

For (b), let E be compact, and let F be a closed subset of E. Because of (a), F 
is a closed subset of X. Let / be an open cover of F. Then U/ U {F°} is an open 
cover of E. Passing to a finite subcover and discarding F°, we obtain a finite 
subcover of F. Thus F is compact. 


A collection of subsets of a nonempty set is said to have the finite-intersection 
property if each intersection of finitely many of the subsets is nonempty. 


Proposition 2.35. A metric space (X, d) is compact if and only if each col- 
lection of closed subsets of X with the finite-intersection property has nonempty 
intersection. 


PROOF. Closed sets with the finite-intersection property have complements 
that are open sets, no finite subcollection of which is an open cover. 


Theorem 2.36. A metric space (X, d) is compact if and only if every sequence 
has a convergent subsequence. 


PROOF. Suppose that X is compact. Arguing by contradiction, suppose that 
{Xn}n>1 1S a Sequence in X with no convergent subsequence. Put F = i haar {xn}. 
The subset F of X is closed by Corollary 2.23, hence compact by Proposition 
2.34b. Since no x, is a limit point of F’, there exists an open set U,, in X containing 
x, but no other member of F. Then {U,,},,>1 1s an open cover of F with no finite 
subcover, and we have arrived at a contradiction. 

Conversely suppose that every sequence has a convergent subsequence. We 
first show that X is separable. Fix an integer n. There cannot be infinitely many 
disjoint open balls of radius 1 /n, since otherwise we could find a sequence from 
among their centers with no convergent subsequence. Thus we can choose a finite 
disjoint collection of these open balls that is not contained in a larger such finite 
collection. Let their centers be x1, ..., xy. The claim is that every point of X is 
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at distance < 2/n from one of these finitely many centers. In fact, if x € X is 
given, form BO; x). This must meet some BOA; x;) at a point y, and then 


d(x, xj) <d(x,y) +d, xj) < +++ =2. 


Thus x is at distance < 2/n from one of the finitely many centers, as asserted. 
Now let n vary, and let D be the set of all these centers for all n. Then every point 
of X has members of D arbitrarily close to it, and hence D is a countable dense 
set in X. Thus X is separable. 

Let U/ be an open cover of X having no finite subcover. By the separability 
and condition (b) in Proposition 2.32, we may assume that // is countable, say 
U = {U,, U2, ...}. Since U; UU2 U--- UU, is not a cover, there exists a point 
X, not in the union of the first n sets. By hypothesis the sequence {x,} has a 
convergent subsequence {x,,}, say with limit x. Since U/is a cover, some member 
Un of U contains x. Then {x,,} is eventually in Uy, and some nx with ny > N 
has x,, in Uy. But xp, is not in U; U---U U,, by construction, and this union 
contains Uy, since n, > N. We have arrived at a contradiction, and we conclude 
that 2/ must have had a finite subcover. 


Corollary 2.37 (Heine—Borel Theorem) In Euclidean space R”, every closed 
bounded set is compact. 


REMARK. Conversely we saw in Proposition 2.34a that every compact subset 
of any metric space is closed and bounded. 


PROOF. Let C be aclosed rectangular solid in R” , and letx™ = (x 5 cada xi) 


be the members of a sequence in C. By the Bolzano—Weierstrass Theorem 
(Theorem 1.8) for R', we can find a subsequence convergent in the first coordinate, 
a subsequence of that convergent in the second coordinate, and so on. Thus {x} 
has a convergent subsequence. By Theorem 2.36, C is compact. Applying 
Corollary 2.34b, we see that every closed bounded subset of R” is compact. 


The next few results will show how the use of compactness both simplifies and 
generalizes some of the theorems proved in Section I.1. 


Proposition 2.38. Let (X, d) and (Y, o) be metric spaces with X compact. If 
f : X — Y is continuous, then f(X) is a compact subset of Y. 


PRooF. If {Uy} is an open cover of f (X), then { f~!(U,,)} is an open cover of 
X. Let {f-1(U Dyin be a finite subcover. Then {U ae is a finite subcover of 


f(X). 


Corollary 2.39. Let (X, d) be a compact metric space, and let f : X — Rbe 
a continuous function. Then f attains its maximum and minimum values. 
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REMARK. Theorem 1.11 was the special case of this result with X = [a, D]. 
This particular space X is compact by the Heine—Borel Theorem (Corollary 2.37), 
and the corollary applies to yield exactly the conclusion of Theorem 1.11. 


PROOF. By Proposition 2.38, f(X) is a compact subset of R. By Proposition 
2.34a, f (X) is closed and bounded. The supremum and infimum of the members 
of f (X) in R* lie in R, since f (X) is bounded, and they are limits of sequences in 
f (X). Since f (X) is closed, Proposition 2.23 shows that they must lie in f(X). 


Corollary 2.40. Let (X, d) and (Y, ) be metric spaces with X compact. If 
f : X — Y is continuous, one-one, and onto, then f is a homeomorphism. 

REMARK. In the hypotheses of the change of variables formula for integrals 
in R! (Theorem 1.34), a function g : [A,B] —> [a,b] was given as strictly 
increasing, continuous, and onto. Another hypothesis of the theorem was that 
vy! was continuous. Corollary 2.40 shows that this last hypothesis was redundant. 

PRooF. Let E be a closed subset of X, and consider (f~!)~!(E) = f(E). 
The set EF is compact by Proposition 2.34b, f(£) is compact by Proposition 2.38, 
and f (E) is closed by Proposition 2.34a. Proposition 2.15b thus shows that f~! 
is continuous. 


If (X, d) and (Y, ¢) are metric spaces, a function f : X — Y is uniformly 
continuous if for each « > O, there is some 6 > O such that d(x1,x2) < 6 
implies p(f (x), f(2)) < €. This is the natural generalization of the definition 
in Section I.1 for the special case of a real-valued function of a real variable. 


Proposition 2.41. Let (X, d) and (Y, oe) be metric spaces with X compact. If 
jf : X — Y is continuous, then f is uniformly continuous. 


REMARK. This result generalizes Theorem 1.10, which is the special case 
X =[a,b] and Y =R. 


PROOF. Let € > O be given. For each x € X, choose 5, > O such 
that d(x’,x) < 6, implies p(f(x’), f(x)) < €/2. The open balls B( 45x; x) 
cover X; let the balls with centers x|,...,x, be a finite subcover. Put 6 = 
5 min{d,,,..., 5,,}. Now suppose that d(x’, x) < 6. The point x is in some ball 
in the finite subcover; suppose x is in B(45x,; xj). Then d(x, x;) < 53x, , so that 


d(x',xj) < d(x’, x) + d(x, xj) < 8+ 54x, < 4,,. 


By definition of 5,,, P(f (x’), f(xj)) < €/2 and p(f (xj), f(x)) < €/2. There- 
fore 


pf’), FO)) < pF), fj) + PF), F@) < §+$=6 


and the proof is complete. 
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One final application of compactness is the Fundamental Theorem of Algebra, 
which is discussed in Section A8 of the appendix in the context of properties of 
polynomials. 


Theorem 2.42 (Fundamental Theorem of Algebra). Every polynomial with 
complex coefficients and degree > 1 has a complex root. 


PROOF. Let P : C — C be the function P(z) = ee ajz/, where do, ..., Gn 
are in C with a, 4 0 and with n > 1. We may assume that a, = 1. Letm = 
infzec |P(z)|. Since P(z) = 2"(1+ay—1z7! +++ +a1z~~) +.a9z), we have 
limz-, 9 P(z)/z” = 1. Thus there exists an R such that | P(z)| > slz\" whenever 
|z| => R. Choosing R = Ro such that Re > 2m, we see that | P(z)| > 2m for 
|z| = Ro. Consequently m = inf\,)<p, |P(z)|. The set S = {z eC | Iz] < Ro} 
is compact by the Heine—Borel Theorem (Corollary 2.37), and Corollary 2.39 
shows that |P(z)| attains its minimum on S at some point zo in S. Then | P(z)| 
attains its minimum on C at zo. We shall show that this minimum value m is 0. 

Assuming the contrary, define Q(z) = P(z + zo)/P(Zo), so that Q(z) is a 
polynomial of degree n > 1 with Q(O) = 1 and |Q(z)| = 1 for all z. Write 


Q(z) = 14 byz* + dppizet! +--+ + Bp” with by #0. 


Corollary 1.45 produces a real number 0 such that e!°b, = —|b,|. For anyr > 0 
with r*|b;| < 1, we then have 


[1 + der“e'*?| = 1 —r*|dxl. 
For such r and that 6, this equality implies that 


|O(z)| < |L+ bere?) rh bei tee +r" [bal 


<1 —r*(Ibel —r1begt| ++ — 2" [bal). 


For sufficiently small r > 0, the expression in parentheses on the right side is 
positive, and then |O(re'®)| < 1, in contradiction to hypothesis. Thus we must 
have had m = 0, and we obtain P(zo) = 0. 


Another theme discussed in Section I.1 is that Cauchy sequences in R! are 
convergent. This convergence was proved in Theorem 1.9 as a consequence of 
the Bolzano—Weierstrass Theorem. Actually, many sequences in metric spaces of 
importance in analysis are shown to converge without one’s knowing the limit in 
advance and without using any compactness, and we therefore isolate the forced 
convergence of Cauchy sequences as a definition. In a metric space (X,d),a 
sequence {x,} is a Cauchy sequence if for any € > 0, there is some integer N 
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such that d(xm,X,) < € whenever m andn are > N. A familiar 2€ argument 
shows that convergent sequences are Cauchy. Other familiar arguments show 
that any Cauchy sequence with a convergent subsequence is convergent and that 
any Cauchy sequence is bounded. 

We say that the metric space (X, d) is complete if every Cauchy sequence in 
X converges to a point in X. We know that the line R! is complete. It follows that 
IR” is complete because a Cauchy sequence in R” is Cauchy in each coordinate. 
A nonempty subset E of X is complete if E as a subspace is a complete metric 
space. The next two propositions and corollary give three examples of complete 
metric spaces. 


Proposition 2.43. A subset E of a complete metric space X is complete if and 
only if it is closed. 


REMARK. In particular every closed subset of R” is a complete metric space. 


PROOF. Suppose E is closed. Let {x,,} be a Cauchy sequence in E. Then {x,} 
is Cauchy in X, and the completeness of X implies that {x,} converges, say to 
some x € X. By Corollary 2.23, x is in E. Thus {x,} is convergent in E. The 
converse is immediate from Corollary 2.23. 


Proposition 2.44. If S is a nonempty set, then the vector space B(S) of 
bounded scalar-valued functions on S, with the uniform metric, is a complete 
metric space. 


PROOF. Let {f;,} be a Cauchy sequence in B(S). Then {f,,(x)} is a Cauchy 
sequence in C for each x in S. Define f(x) = lim, f,(x). For any € > 0, 
we know that there is an integer N such that | f,(x) — fin(x)| < € whenever 
n and m are > N. Taking into account the continuity of the distance function 
on C, i.e., the continuity of absolute value, we let m tend to infinity and obtain 
|fn(x) — f(x)| < € forn > N. Thus { f,,} converges to f in B(S). 


Corollary 2.45. Let (S, d) be a metric space. Then the vector space C(S) of 
bounded continuous scalar-valued functions on S, with the uniform metric, is a 
complete metric space. 


REMARK. C(S) was observed to be a vector subspace in the remarks with 
Corollary 2.29. 


PROOF. The space B(S) is complete by Proposition 2.44, and C(S) is a closed 
metric subspace by Corollary 2.24. Then C(S) is complete by Proposition 2.43. 


Now we shall relate compactness and completeness. A metric space (X, d) is 
said to be totally bounded if for any € > 0, finitely many open balls of radius € 
cover X. 
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Theorem 2.46. A metric space (X, d) is compact if and only if it is totally 
bounded and complete. 


PROOF. Let (X,d) be compact. If € > 0 is given, the open balls B(e; x) 
cover X. By compactness some finite number of the balls cover X. Therefore 
X is totally bounded. Next let a Cauchy sequence {x,} be given. By Theorem 
2.36, {x,} has a convergent subsequence. A Cauchy sequence with a convergent 
subsequence is necessarily convergent, and it follows that X is complete. 

In the reverse direction, let X be totally bounded and complete. Theorem 2.36 
shows that it is enough to prove that any sequence {x,} in X has a convergent 
subsequence. By total boundedness, find finitely many open balls of radius 1 
covering X. Then infinitely many of the x,’s have to lie in one of these balls, 
and hence there is a subsequence {x,,} that lies in a single one of these balls of 
radius 1. Next finitely many open balls of radius 1/2 cover X. In the same way 
there is a subsequence {Xn,,} of {Xn,} that lies in a single one of these balls of 
radius 1/2. Continuing in this way, we can find successive subsequences, the m™ 
of which lies in a single ball of radius 1/m. The Cantor diagonal process, used in 
the proof of Theorem 1.22, allows us to form a single subsequence {x;,} of {xn} 
such that for each m, {x;,} is eventually in a ball of radius 2~”. If € > 0 is given, 
find m such that 2~” < e, and let c,, be the center of the ball of radius 1/m. 
Choose an integer N such that x;, lies in B(1/m; cj) whenever jj > N. If j; => N 
and jj = N, then d(C, x;,) < € and d(C, x;,) < €, whence d(xj,, x;,) < 2e. 
Therefore the subsequence {x;,} is Cauchy. By completeness it converges. Hence 
{x,} has a convergent subsequence, and the theorem is proved. 


Let (X,d) and (Y, ¢) be metric spaces, and let f : X — Y be uniformly 
continuous. Then f carries Cauchy sequences to Cauchy sequences. In fact, if 
{x,} is Cauchy in X and if € > 0 is given, choose some 6 of uniform continuity 
for f and €, and find an integer N such that d(x, X»") < 6 whenever n and n’ are 
> N. Then p(f (x), f(4n’)) < € for the same n’s and n’’s, and hence { f (x,)} 
is Cauchy. 


Proposition 2.47. Let (X,d) and (Y, pe) be metric spaces with Y complete, 
let D be a dense subset of X, and let f : D — Y be uniformly continuous. Then 
f extends uniquely to a continuous function F : X — Y, and F is uniformly 
continuous. 


PROOF OF UNIQUENESS. If x is in X, apply Proposition 2.22a to choose a 
sequence {x,}in D with x, — x. Continuity of F forces F(x,) > F(x). But 
F(x,) = f(@,) for alln. Thus F(x) = lim, f (x,) is forced. 


PROOF OF EXISTENCE. If x is in X, choose x, € D with x, — x. Since 
{x,} is convergent, it is Cauchy. Since f is uniformly continuous, {f(x,)} is 
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Cauchy. The completeness of Y then allows us to define F(x) = lim f(x,), but 
we must see that F is well defined. For this purpose, suppose also that {y,} is a 
sequence in D that converges to x. Let {z,} be the sequence x), yj, X2, yo,.... 
This sequence is Cauchy, and {x,} and {y,} are subsequences of it. Therefore 
lim f(y) = lim f (zn) = lim f (x,), and F(x) is well defined. 

For the uniform continuity of F, let € > O be given, and choose some 6 
of uniform continuity for f and €/3. Suppose that x and x’ are in X with 
d(x, x’) < 6/3. Choose x, in D with d(x,, x) < 5/3and p(f (x), F(x)) < €/3, 
and choose x/, in D with d(xj, x’) < 6/3 and p(f (x1), F(x’)) < €/3. Then 
d(Xn,x;) < 6 by the triangle inequality, and hence p(f (xn), f(x;,)) < €/3. 
Thus p(F (x), F(x’)) < € by the triangle inequality. 


8. Connectedness 


Although the Intermediate Value Theorem (Theorem 1.12) in Section I.1 was 
derived from the Bolzano—Weierstrass Theorem, the Intermediate Value Theorem 
is not to be regarded as a consequence of compactness. Instead, the relevant 
property is “connectedness,” which we discuss in this section. 

A metric space (X, d) is connected if X cannot be written as X = UUV 
with U and V open, disjoint, and nonempty. A subset FE of X is connected if E 
is connected as a subspace of X, i.e., if E cannot be written as a disjoint union 
(EN U)U(ENV) with U and V open in X and with E 1 U and EN V both 
nonempty. The disjointness in this definition is of EM U and E M V; the open 
sets U and V may have nonempty intersection. 


Proposition 2.48. The connected subsets of R are the intervals —open, closed, 
and half open. 


PRooF. Let E be a connected subset of R, and suppose that there are real 
numbers a, b, c such thata < c < b,a and bare in E, and c is notin E. Forming 
the open sets U = (—ox, c) and V = (c, +00) in R, we see that E is the disjoint 
union of E MU and EM V and that these two sets are nonempty. Thus E is not 
connected. 

Conversely suppose that J is an open, closed, or half-open interval of R from 
a to b, with a ¥ b but with a or b or both allowed to be infinite. Arguing 
by contradiction, suppose that J is not connected. Choose open sets U and 
V in R such that J is the disjoint union of J M U and J M V and these two 
sets are nonempty. Without loss of generality, there exist members c and c’ of 
IU and INV, respectively, with c < c’. Since U is open and c has to be 
< b, all real numbers c + € with € > O sufficiently small are in 7 1 U. Let 
d = sup {x | [c,u) 1 NU},s0 that d >C. 
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Ifd < b, then the fact that U is open implies that d is not in MU. Thus d is 
in MV. Since V is open andd > a,d—eisinI NV ife > Ois sufficiently 
small. But then d — € is in both JM U and J V for € sufficiently small. This is 
a contradiction, and we conclude that d = b. 

Ifd = bisin/ 1 V, then the same argument shows that b — € is in both NU 
and JM V for € positive and sufficiently small, and we again have a contradiction. 
Consequently all points from c to the right end of / are in J 1 U. This is again a 
contradiction, since c’ is known to be in INV. 


Proposition 2.49. The continuous image of a connected metric space is 
connected. 


PROOF. Let (X,d) and (Y, ¢) be metric spaces, and let f : X — Y be 
continuous. We are to prove that f(X) is connected. Corollary 2.27 shows that 
there is no loss of generality in assuming that f(X) = Y,ie., f is onto. Arguing 
by contradiction, suppose that Y is the union Y = U U V of disjoint nonempty 
open sets. Then X = f~'(U) U f7!(V) exhibits X as the disjoint union of 
nonempty sets, and these sets are open as a consequence of Proposition 2.15a. 
Thus X is not connected. 


Corollary 2.50 (Intermediate Value Theorem). For real-valued functions of a 
real variable, the continuous image of any interval is an interval. 


PROOF. This is immediate from Propositions 2.48 and 2.49. 


Further connected sets beyond those in R are typically built from other con- 
nected sets. One tool is a path in X, which is a continuous function from a closed 
bounded interval [a, b] into X. The image of a path is connected by Propositions 
2.48 and 2.49. A metric space (X, d) is pathwise connected if for any two points 
x; and x2 in X, there is some path p from x; to x2, 1.e., if there is some continuous 
p: [a,b] > X with p(a) = x; and p(b) = x2. 

A pathwise-connected metric space (X, d) is necessarily connected. In fact, 
otherwise we could write X as a disjoint union of two nonempty open sets U and 
V. Let x; be in U and xz be in V, and let p : [a,b] — X bea path from x, to 
x2. Then p([a, b]) = (p([a, b]) NU) U (p([a, b]) NV) exhibits p([a, b]) as a 
disjoint union of relatively open sets, and these sets are nonempty, since x; is in 
the first set and x2 is in the second set. Consequently p([a, b]) is not connected, 
in contradiction to the fact that the image of any path is connected. 

We can view a pathwise-connected metric space as the union of images of 
paths from a single point to all other points, and such a union is then connected. 
The following proposition generalizes this construction. 
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Proposition 2.51. If (X, d) isa metric space and {E,} is a system of connected 
subsets of X with a point x9 incommon, then LJ, Ey is connected. 


PRooF. Assuming the contrary, find open sets U and V in X such that ),, Eo 
is the disjoint union of its intersections with U and V and these two intersections 
are both nonempty. Say that xo isin U. Since Ey is connected and xo isin EyNU, 
the decomposition Ey = (Eg NU) U (Ea NV) forces Ey M V to be empty. Then 
( U, Ea) AV = U, (Ea V+) is empty, and we have arrived at a contradiction. 


Proposition 2.52. If (X, d) is a metric space and E is a connected subset of 
X, then the closure E“! is connected. 


PROOF. Suppose that U and V are open sets in X such that E“' is contained 
in U UV and E' MU NV is empty. We are to prove that E'M U and E' NV 
cannot both be nonempty. Arguing by contradiction, let x be in EM U and let y 
be in E“ NV. Since E is connected, EMU and EV cannot both be nonempty, 
and thus x and y cannot both be in E. Thus at least one of them, say x, is a limit 
point of E. Since U is a neighborhood of x, U contains a point e of E different 
from x. Thus e isin EU. Since y cannot then be in E 1 V, y is a limit point 
of E. Since V is a neighborhood of y, V contains a point f of E different from 
y. Thus f is in E 1 V, and we have arrived at a contradiction. 


EXAMPLE. The graph in R? of sin(1/x) for 0 < x < 1 is pathwise connected, 
and we have seen that pathwise-connected sets are connected. The closure of this 
graph consists of the graph together with all points (0, t) for —1 < ¢ < 1, and 
this closure is connected by Proposition 2.52. One can show, however, that this 
closure is not pathwise connected. Thus we obtain an example of a connected set 
in R? that is not pathwise connected. 


9. Baire Category Theorem 


A number of deep results in analysis depend critically on the fact that some metric 
space is complete. Already we have seen that the metric space C (S$) of bounded 
continuous scalar-valued functions on a metric space is complete, and we shall 
see as not too hard a consequence in Chapter XII that there exists a continuous 
periodic function whose Fourier series diverges at a point. One of the features 
of the Lebesgue integral in Chapter V will be that the metric spaces of integrable 
functions and of square-integrable functions, with their natural metrics, are further 
examples of complete metric spaces. Thus these spaces too are available for 
applications that make use of completeness. 

The main device through which completeness is transformed into a powerful 
hypothesis is the Baire Category Theorem below. A closed set in a metric space 
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is nowhere dense if its interior is empty. Its complement is an open dense set, 
and conversely the complement of any open dense set is closed nowhere dense. 


EXAMPLE. A nontrivial example of a closed nowhere dense set is a Cantor set* 
in R. This is a set constructed from a closed bounded interval of R by removing 
an open interval in the middle of length a fraction r; of the total length with 
0 <r; < 1, removing from each of the 2 remaining closed subintervals an open 
interval in the middle of length a fraction r2 of the total length of the subinterval, 
removing from each of the 4 remaining closed subintervals an open interval in 
the middle of length a fraction r3 of the total length of the interval, and so on 
indefinitely. The Cantor set is obtained as the intersection of the approximating 
sets. It is closed, being the intersection of closed sets, and it is nowhere dense 
because it contains no interval of more than one point. For the standard Cantor 
set, the starting interval is [0, 1], and the fractions are given byry =r. =--- = t 
at every stage. In general, the “length” of the resulting set* is the product of the 
length of the starting interval and he (1 —ry,). 


Theorem 2.53 (Baire Category Theorem). If (X,d) is a complete metric 
space, then 


(a) the intersection of countably many open dense sets is nonempty, 
(b) X is not the union of countably many closed nowhere dense sets. 


PROOF. Conclusions (a) and (b) are equivalent by taking complements. Let us 
prove (a). Suppose that U,, is open and dense for n > 1. Since U; is nonempty 
and open, let E; be an open ball B(r1; x1) whose closure is in U; and whose radius 
is 7; < 1. We construct inductively open balls E, = B(rn; Xn) with ry < 1 such 
that E, C U;N---NU, and oes C E,-1. Suppose E, with n > 1 has been 
constructed. Since U,+; is dense and E,, is nonempty and open, Uni M En is 
not empty. Let x,4, be a point in U,4, 1 E,. Since U;,41 9 Ey is open, we can 


find an open ball Fn41 = B(rn4i3 Xn41) with radius rp41 < ral and center the 


point x,41 in Uy+1 such that Bos © Uns41 O En. Then E,+1 has the required 
properties, and the inductive construction is complete. The sequence {x,} is 


Cauchy because whenever n > m, the points x, and x, are both in E,, and thus 
have d(Xn, Xm) < 1. Since X is by assumption complete, let x, — x. For any 
integer N, the inequality n > N implies that x, is in E41. Thus the limit x is in 
one C Ey CU, N---NUy. Since N is arbitrary, x is in (2, Un. 


4Often a mathematician who refers to “the” Cantor set is referring to what is called the “standard 
Cantor set” later in the present paragraph. 

>To be precise, the length is the “Lebesgue measure” of the set in the sense to be defined in 
Chapter V. 
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REMARK. In (a), the intersection in question is dense, not merely nonempty. 
To see this, we observe in the first part of the proof that since U; is dense, EF, can 
be chosen to be arbitrarily close to any member of X and to have arbitrarily small 
radius. Following through the construction, we see that x is in E; and hence can 
be arranged to be as close as we want to any member of X. The corresponding 
conclusion in (b) is that a nonempty open subset of X is never contained in the 
countable union of closed nowhere dense sets. 


EXAMPLES. 


(1) The subset Q of rationals in R is not the countable intersection of open 
sets. In fact, assume the contrary, and write Q = (]°2, U,, with U, open. Each 
set U, contains Q and hence is dense in R. Also, for g € Q, the set R — {q} is 
open and dense. Thus the equality Q = (2, U, implies that 


(Me) (Qe - tn) 


is empty, in contradiction to Theorem 2.53. 


(2) Let us start with a Cantor set as at the beginning of this section. The total 
interval is to be [0, 1], and the set is to be built with middle segments of fractions 
r1,12,... . Within the closure of each removed open interval, we insert a Cantor 
set for that interval, possibly with different fractions r;,r2,... for each inserted 
Cantor set. This insertion involves further removed open intervals, and we insert 
a Cantor set into each of these. We continue this process indefinitely. The union 
of the constructed sets is dense. Can it be the entire interval [0, 1]? The answer 
is “no” because each of the Cantor sets is closed nowhere dense and because by 
Theorem 2.53, the interval [0, 1] is not the countable union of closed nowhere 
dense sets. 


A subset FE of a metric space is said to be of the first category if it is contained 
in the countable union of closed nowhere dense sets. Theorem 2.53 and the 
remark after it together imply that no nonempty open set in a complete metric 
space is of the first category. 


Theorem 2.54. Let (X, d) be a complete metric space, and let U be an open 
subset of X. Suppose form > 1 that f, : U — Cis acontinuous function and that 
Jn converges pointwise to a function f : U — C. Then the set of discontinuities 
of f is of the first category. 


The proof will make use of the notion of the oscillation of a function. For any 
function g : U — C, define 


OSCg(Xo) = lim sup _|g(x) — g(xo)I, 
5)0 xe€B(d;x0) 
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so that g is continuous at xo if and only if oscg(xo) = O. At first glance it 
might seem that the sets {x | OSCg (x) = r} are always closed, no matter what 
discontinuities g has. Actually, these sets need not be closed. Take, for example, 
the function g : R — R that is | at every nonzero rational, 0 at every irrational, 
and 1/2 at 0. Then osc,(x) is 1 at every x in R except for x = 0, where it is 1/2. 
Thus, in this example, the set {x | OSCg(X) > 1} is R — {0} and is not closed. 


Lemma 2.55. Let (X, d) be a complete metric space, and let U be an open 
subset of X. If g : U — Cis a function and € > 0 is a positive number, then 
{x eU | OSCg(x) > 2} < {x eU | OSCg(x) > st. 
PROOF. We need to see that the limit points of the set on the left are in the 
set on the right. Thus suppose that osc,(x,) > 2¢ for all n and that x, — xo. 
For each n, choose Xy,m such that lim» Xnm = Xn and |g(%n,m) — g(Xn)| = € for 
all m. Because of the convergence of Xn,m to X,, we may choose, for each n, an 
integer m = m, such that d(X%_,m,,%n) < d(Xo, X,), and then lim, Xn, = Xo 
by the triangle inequality. From |g(%n.m,) — g@n)| = €, the triangle inequality 
forces 
| @n,m,) — 8@o)| = 5 or lg@n) — g(%)| = 5- (*) 


Defining y, to be x, , Of X, according as the first or second inequality is the case 
€ 


in (*), we have y, — xo and |g(yn) — g(xo)| = 5. This proves the lemma. 

PROOF OF THEOREM 2.54. In view of Lemma 2.55 and the fact that U is not 
of first category (Theorem 2.53 and the remark afterward), it is enough to prove 
for each € > O that {x | osc f(x) = e} does not contain a nonempty open subset 
of X. Assuming the contrary, suppose that it contains the nonempty open set V. 
Define 


Amn = {x €V | Lfm(x) — fa) < §}- 


This is a relatively closed subset of V. Then A,, = (),,~,, Amn is closed in V. If 
x is in V, the fact that { f;,(x)} is a Cauchy sequence implies that there is some m 
such that x is in Aj», for all n > m. Hence (J°_, Am = V. Again by Theorem 
2.53 and the remark after it, some A,, has nonempty interior. Fix that m, and let 
W be its nonempty interior. Since 


Am © {x €V||fm(&) — f@)| < $f 


every point of W has | fin(x) — f(x)| < § and oscr(x) > €. Let xo be in W and 


a 
choose x, tending to xo with | f(x») — f(xo)| = 3 From | fin(%n) — f (Xn) | < i 
and | fin (xo) — f (xo)| < awe obtain | fin (Xn) — fm(%0)| = 4. Since x, converges 


to xo, this inequality contradicts the continuity of fi, at xo. 
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If (S, d) is a metric space, then we saw in Proposition 2.44 that the vector space 
B(S) of bounded scalar-valued functions on S, in the uniform metric, is a complete 
metric space. We saw also in Corollary 2.45 that the vector subspace C(S) of 
bounded continuous functions is a complete subspace. In this section we shall 
study the space C (S) further under the assumption that S is compact. In this case 
Propositions 2.38 and 2.34 tell us that every continuous scalar-valued function 
on S is automatically bounded and hence is in C(S). 

The first result about C(S) for S compact is a generalization of Ascoli’s 
Theorem from its setting in Theorem 1.22 for real-valued functions on a bounded 
interval [a, b]. The generalized theorem provides an insight that is not so obvious 
from the special case that S is a closed bounded interval of IR. The insight is a 
characterization of the compact subsets of C(S) when S is compact, and it is 
stated precisely in Corollary 2.57 below. The relevant definitions for Ascoli’s 
Theorem are generalized in the expected way. Let F = {fy | a € A} bea 
set of scalar-valued functions on the compact metric space S. We say that F 
is equicontinuous at x € S if for each € > 0, there is some 6 > O such that 
d(t,x) < 6 implies | f(t) — f(x)| < € forall f © F. The set F of functions 
is pointwise bounded if for each t € [a, b], there exists a number M, such that 
|f(t)| < M, for all f € F. The set is uniformly equicontinuous on S if it is 
equicontinuous at each point x € S and if the 6 can be taken independent of x. 
The set is uniformly bounded on S if it is pointwise bounded at each t € S and 
the bound M, can be taken independent of f; this last definition is consistent with 
the definition of a uniformly bounded sequence of functions given in Section 4. 


Theorem 2.56 (Ascoli’s Theorem). Let (S, d) be a compact metric space. If 
{ fn} is a sequence of scalar-valued functions on S that is equicontinuous at each 
point of S and pointwise bounded on S, then 


(a) {f,} 1s uniformly equicontinuous and uniformly bounded on S, 
(b) {f,} has a uniformly convergent subsequence. 


REMARKS. The proof involves only notational changes from the special case 
Theorem 1.22; there are enough such changes, however, so that it is worth writing 
out the details. Inspection of this proof shows also that the range R or C may be 
replaced by any compact metric space. We shall see a further generalization of 
this theorem in Chapter X, and the proof at that time will look quite different. 


PROOF. Since each f,, is continuous at each point, we know from Propositions 
2.38, 2.34a, and 2.41 that each f, is uniformly continuous and bounded. The 
proof of (a) amounts to an argument that the estimates in those theorems can be 
arranged to apply simultaneously for all n. 
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First consider the question of uniform boundedness. Choose, by Corollary 
2.39, some x, in S with | f,(x,)| equal to K, = sup,cs | fn(x)|. Then choose 
a subsequence on which the numbers K,, tend to sup, K,, in R*. There will be 
no loss of generality in assuming that this subsequence is our whole sequence. 
By compactness of S, apply the Bolzano—Weierstrass property given in Theorem 
2.36 to find a convergent subsequence {x,,} of {x,}, and let xo be the limit of this 
subsequence. By pointwise boundedness, find M,, with | f,(xo)| < M,, for all 
n. Then choose some 6 of equicontinuity at x9 for € = 1. As soon as k is large 
enough so that d(xy,,%0) < 5, we have 


Kn, = | fre np) | < | Fn, ng) _ fn. (X0)| + | fay (X0) | at Bo My. 


Thus 1 + M,, is a uniform bound for the functions fy. 
For the uniform equicontinuity, fix € > 0. The uniform continuity of f,, for 
each n, as given in Proposition 2.41, means that it makes sense to define 


ae ’ |f(x) — fO)| < © whenever 
By(e) = min | sup {3 on d(x, y) < 6’ and x and y arein S | | ° 


If d(x, y) < 6,(e€), then | f,(x) — fr(y)| < €. Put 6(€) = inf, 5,(€). Let us see 
that it is enough to prove that d(€) > 0: If x and y are in S with d(x, y) < d(€), 
then d(x, y) < d(€) < 6,(e€). Hence | f, (x) — fr(y)| < € as required. 

Thus we are to prove that 6(€) > 0. If 6(€) = 0, then we first choose a strictly 
increasing sequence {nx} of positive integers such that 5,,(€) < +, and we next 
choose x; and yz in S with d(xxz, ye) < dn, (€) and | fx (xx) — fr(ve)| = €. Using 
the Bolzano—Weierstrass property again, we obtain a subsequence {xx,} of {xx} 
such that {x;,} converges, say to a limit x9. Then 


lim sup d(yx,, Xo) < lim sup d(yx,, xz,) + lim sup d(x;,, x») =0+0=0, 
l 1 l 


so that {y,z,} converges to x9. Now choose, by equicontinuity at x9, a number 
6’ > 0 such that | f,(x) — fnr(xo)| < 5 for all n whenever d(x, xo) < 6’. The 
convergence of {x;,} and {y;,} to x9 implies that for large enough /, we have 
d(xx,,X0) < 6’ and d(yx,,x0) < 6’. Therefore | fx,(xx,) — fi (%o)| < 5 and 
| fi Yk.) — Se, X0)| < §, from which we conclude that | fi, (x%,.) — fe (Ve) | < €- 
But we saw that | f;(x,) — f0%)| = € for all k, and thus we have arrived at a 
contradiction. This proves the uniform equicontinuity and completes the proof 
of (a). 

To prove (b), let R be a compact set containing all sets image(/,,). Choose 
a countable dense set D in S by Proposition 2.33. Using the Cantor diagonal 
process and the Bolzano—Weierstrass property of R, we construct a subsequence 
{fn} Of {fr} that is convergent at every point in D. Let us prove that { fn,} is 
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uniformly Cauchy. Let € > 0 be given, and let 5 be some corresponding number 
exhibiting equicontinuity. The balls B(6; 7) centered at the members r of D cover 
S,and the compactness of S gives us finitely many of their centers 71, ..., 7; such 
that any member of S is within 6 of at least one of 71,...,7;. Then choose N 
with | fr(rj) — fn(rj)| < € for 1 < j <7 whenevern andm are > N.Ifx isin S, 
let r(x) be anr; with d(x, r(x)) < 6. Whenever n and m are > N, we then have 


| fn(X) _ fin(x)| 
S| fn) — fa @))I + Sn) — fin &))| + fin FO) = fin 0) | 
<etet+te=3e. 


Hence {f;,,} is uniformly Cauchy, and (b) follows since the metric space C(S) is 
complete. 


Corollary 2.57. If (S, d) is a compact metric space, then a subset E of C(S) 
in the uniform metric has compact closure if and only if E is uniformly bounded 
and uniformly equicontinuous. 


PROOF. First let us see that if E is uniformly bounded and uniformly equicon- 
tinuous, then so is E“. In fact, if | f (x)| < M for f € E, then the same thing is 
true of any uniform limit of such functions. Hence E“' is uniformly bounded. For 
the uniform equicontinuity of E“, let € be given, and find some 4 of equicontinuity 
for € and the members of E. If f is a limit point of EF, we can find a sequence 
{fn} in E converging uniformly to f. If d(x, y) < 6, then the inequality 


If) — FO SIF) — frOOl + lfr@) — pO + lin) — FO)! 


and the uniform convergence show that we obtain | f (x) — f()| < 3e by fixing 
any sufficiently large n. Thus E“ is uniformly equicontinuous. 

Now suppose that F is a closed subset of C(S) that is uniformly bounded 
and equicontinuous. Then Theorem 2.56 shows that any sequence in FE has 
a subsequence that is convergent in C(S). Since E is closed, the sequence is 
convergent in FE. Theorem 2.36 then shows that E is compact. 

Conversely suppose that E is compact in C(S). Distance from 0 in C(S) is a 
continuous real-valued function by Corollary 2.17, and this continuous function 
has to be bounded on the compact set EF. Thus EF is uniformly bounded. For the 
uniform equicontinuity, let € > 0 be given. Theorem 2.46 shows that F is totally 
bounded. Hence we can find a finite set f|,..., f7 in E such that each member f 
of E has sup,.s|f(«) — fj()| < € for some j. By uniform continuity of each 
fi, choose some number 6 > 0 such that d(x, y) < 6 implies | fi(x) — fiQ)| < € 
for 1 <i </J. If fj is the member of the finite set associated with f, then 
d(x, y) < 6 implies 


If@) — FOS IF@) — FOL+ 1G) — FOI + 1GFO) — FOIL < 3e. 


Hence E is uniformly equicontinuous. 
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The second result about C(S) when S is compact generalizes the Weierstrass 
Approximation Theorem (Theorem 1.52) of Section 1.9. We shall make use of a 
special case of the Weierstrass theorem in the proof—that |x| is the uniform limit 
on [—1, 1] of polynomials P,,(x) with P,(0) = 0. This special case was proved 
also by a direct argument in Section 1.8. 

Let us distinguish the case of real-valued functions from that of complex- 
valued functions, writing C(S, R) and C(S, C) in the two cases. The theorem in 
question gives a sufficient condition for a “subalgebra” of C(S, R) or C(S, C) to 
be dense in the whole space in the uniform metric. Pointwise addition and scalar 
multiplication make C(S, R) into a real vector space and C(S, C) into a complex 
vector space, and each space has also the operation of pointwise multiplication; all 
of these operations on functions preserve continuity as a consequence of Corollary 
2.29. By asubalgebra of C (S, R) or C(S, C), we mean any nonempty subset that 
is closed under all these operations. The space C(S, C) has also the operation 
of complex conjugation; this again preserves continuity by Corollary 2.29. 

We shall work with a subalgebra of C(S, R) or of C(S,C), and we shall 
assume that the subalgebra is closed under complex conjugation in the case of 
complex scalars. The closure of such a subalgebra in the uniform metric is again 
a subalgebra. To see that this closure is a subalgebra requires checking each 
operation separately, and we confine our attention to pointwise multiplication. If 
sequences { f,,} and {g,} converge uniformly to f and g, then { f,g,} converges 
uniformly to fg because 


wa lfn(x)8n(x) — f(x)g@)| 
< sup | fn(X) (n(x) — g(x))| + nen (fn) — fx))g@)| 


< (sup |fa(2)1) (sup ign) gl) + ( sup ig@1) (sup |faC)— F291) 


with sup, <5 |g(x)| finite and sup,.s | f,(x)| convergent to sup,<s | f(x)]. 
We say that a subalgebra of C(S, R) or C(S, C) separates points if for each 
pair of distinct points x; and x2 in S, there is some f in the subalgebra with 


f (x1) # f 2). 


Theorem 2.58 (Stone—Weierstrass Theorem). Let (S, d) be a compact metric 
space. 


(a) If A is a subalgebra of C(S, R) that separates points and contains the 
constant functions, then A is dense in C(S, R) in the uniform metric. 

(b) If Ais asubalgebra of CS, C) that separates points, contains the constant 
functions, and is closed under complex conjugation, then A is dense in 
C(S, C) in the uniform metric. 
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PROOF OF (a). Let A‘! be the closure of A in the uniform metric. We recalled 
above from Chapter I that |r| is the limit of polynomials tf +> P,,(t) uniformly on 
[—1, 1]. It follows that |t| is the limit of polynomials t > Q,(t) = MP, (M~'t) 
uniformly on [—M, M]. Taking M = sup,.,|/f(«)|, we see that | f| is in Ae 
whenever f is in A. 

Since A‘! is a subalgebra closed under addition and scalar multiplication as 
well, the formulas 


max{f,g} = 3(f +8) + 351f — gl, 
min{ f, g} = 5(f +8) —51f — gl, 


show that A‘! is closed under pointwise maximum and pointwise minimum for 
two functions. Iterating, we see that A“ is closed under pointwise maximum and 
pointwise minimum for n functions for any integer n > 2. 

The heart of the proof is an argument that if f ¢ C(S,R),x € S,ande > 0 
are given, then there exists g, in A“ such that g,(x) = f(x) and 


&x(s) > f(s) —€ 


for alls ¢ S. The argument is as follows: Foreach y € S other than x, there exists 
a function in A taking distinct values at x and y. Some linear combination of this 
function and the constant function | is a function hy in A with hy(x) = f(x) 
and hy(y) = f(y). To complete the definition of hy for all y € S, we set 
h, equal to the constant function f(x)1. The continuity of h, and the equality 
hy(y) = f(y) imply that there exists an open neighborhood U, of y such that 
hy(s) > f(s) —e for all s € Uy. As y varies, these open neighborhoods cover 
S, and by compactness of S, finitely many suffice, say Uy,,...U,,. Then the 
function gy = max{hy,,...,Ay,} has g,(s) > f(s) — € forall s € S. Also, it 
has g,(x) = f(x), and it is in A“, since A“ is closed under pointwise maxima. 

To complete the proof of (a), we continue with f € C(S,R) and e > Oas 
above. We shall produce a member h of A‘! such that |h(s) — f(s)| < € for all 
s € S. For each x, the continuity of g, and the equality g,(x) = f(x) imply 
that there is an open neighborhood V,. of x such that g,.(s) < f(s) + € for all 
s € V,. As x varies, these open neighborhoods cover S, and by compactness of 
S, finitely many suffice, say V,,,...V,. The function h = min{g,,,..., gx,} has 
h(s) < f(s) +€ forall s € S, and it is in (A) = A", since each gy, is in A". 
Since each 8x; has 8x,(S) > f(s) —€ forall s € S, we have h(s) > f(s) —€as 
well. Thus |h(s) — f(s)| < € foralls € S. 

Since € is arbitrary, we conclude that f is a limit point of A‘. But A‘ is 
closed, and hence f is in A‘. Therefore A‘! = C(S, R). 
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PROOF OF (b). Let Ap be the subset of members of A that take values in R. 
Then Ak is certainly closed under addition, multiplication by real scalars, and 
pointwise multiplication, and the real-valued constant functions are in Ap. If 
f = u-+iv is in A and has real and imaginary parts u and v, then f is in A 
by assumption, and hence so are u = i(f + f) andv = x(f — f). Weare 
given that A separates points of S. If x; and x2 are distinct points of S with 
f(%1)  f (x2), then either u(x,) 4 u(x2) or v(x1) ¥ v(x2), and it follows that 
Arp separates points. By (a), Ar is dense in C(S, R). Finally let f = u + iv be 
in C(S, C), and let {u,} and {v,} be sequences in Ap converging uniformly to u 
and v, respectively. Then {u,, +iv,} is a sequence in A converging uniformly to 
f. Hence A is dense in Ap. 


EXAMPLES. 

(1) On a closed bounded interval [a, b] of the line, the scalar-valued polyno- 
mials form an algebra that separates points, contains the constants, and is closed 
under conjugation. The Stone—Weierstrass Theorem in this case reduces to the 
Weierstrass Theorem (Theorem 1.52), saying that the polynomials are dense in 
C([a, b]). 

(2) Consider the algebra of continuous complex-valued periodic functions 
on [—z, 1] and the subalgebra of complex-valued trigonometric polynomials 
ie v cne’"*; here N depends on the trigonometric polynomial. Neither the 
algebra nor the subalgebra separates points, since all functions in question have 
f(—2) = f(z). To make the theorem applicable, we consider the domain of 
these functions to be the unit circle of C, parametrized by e’*; this parametriza- 
tion is permissible by Corollary 1.45, and continuity is preserved. The Stone— 
Weierstrass Theorem then applies and gives a new proof that the trigonometric 
polynomials are dense in the space of complex-valued continuous periodic func- 
tions; our earlier proof was constructive, deducing the result as part of Fejér’s 
Theorem (Theorem 1.59). 

(3) Let S”~! be the unit sphere {x € R" | |x| = 1} in R”. The restrictions 
to S”—! of all scalar-valued polynomials P(x;,...,X,) in n variables form a 
subalgebra of C(S’"~!) that separates points, contains the constants, and is closed 
under conjugation. The Stone—Weierstrass Theorem says that this subalgebra is 
dense in C(S"~'), 

(4) Let S be the closed unit disk {z | |z| < 1} in C. The set A of restrictions to 
S of sums of power series having infinite radius of convergence is a subalgebra of 
C(S, C) that separates points and contains the constants. However, the continuous 
function Z is not in the closure, because it has integral 0 over S with every member 
of A and also with uniform limits on S of members of A. This example shows the 
need for some hypothesis like “closed under complex conjugation” in Theorem 
2.58b. 
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Corollary 2.59. If (S, d) is a compact metric space, then C (S$) is separable as 
a metric space. 


PROOF. It is enough to consider C(S, C), since C(S, R) is a metric subspace 
of C(S, C). Being compact metric, S is separable by Proposition 2.33. Let B be 
a countable base of S. The number of pairs (U, V) of members of 6 such that 
U" C V is countable. By Proposition 2.30e, there exists a continuous function 
fuy : S — Rsuch that fyy is 1 on U" and fyy is 0 on V°. Let us show that 
the system of functions fyy separates points of S. 

If x; and x2 are given, the T, property of S (Proposition 2.30a), when combined 
with Proposition 2.31, gives us a member V of 6 such that x; isin V and V C 
{x2}°. Since the set V“ is closed and does not contain x;, the property that S$ is 
regular (Proposition 2.30c) gives us disjoint open sets U; and V, with x; € U; 
and V° C V;. The latter condition means that V > V/. By Proposition 2.31 
let U be a basic open set with x1 € U and U C U;. Then we have x} € U C 
U,C ue Cc Vi C V and hence also x} € U C U*! C V. The function fuv is 
therefore 1 on x; and 0 on x2, and the system of functions fry separates points. 

The set of all finite products of functions fyy and the constant function 1 
is countable, and so is the set D of linear combinations of all these functions 
with coefficients of the form g; + igz with qg, and q2 rational. The claim is that 
this countable set D is dense in C(S, C). The closure of D certainly contains the 
algebra A of all complex linear combinations of the function | and arbitrary finite 
products of functions fyy, and A is closed under complex conjugation. By the 
Stone—Weierstrass Theorem (Theorem 2.58), A‘! = C(S, C). Since D™ contains 
A, we have C(S, C) = A? ¢ (D")*! = D". In other words, D is dense. 
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If (X, d) and (Y, e) are two metric spaces, an isometry of X into Y is a function 
gy: X — Y that preserves distances: p(p(x1), 9(X2)) = d(X1, X2) for all x; and 
x2 in X. For example, a rotation (x, y) (x cos@—ysin6, x sin@ + ycos@) is 
an isometry of R? with itself. An isometry is necessarily continuous (with 5 = €). 
However, an isometry need not have the whole range as image. For example, the 
map x +> (x, 0) of R! into R? is an isometry that is not onto R?. In the case that 
there exists an isometry of X onto Y, we say that X and Y are isometric. 


Theorem 2.60. If (X, d) is a metric space, then there exist a complete metric 
space (X*, A) and an isometry g : X — X™* such that the image of X in X* is 
dense. 


REMARK. It is observed in Problems 25-26 at the end of the chapter that 
(X*, A) and g : X — X®* are essentially unique. The metric space (X*, A) is 
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called a completion of (X, d), or sometimes “the” completion because of the 
essential uniqueness. There is more than one construction of X*, and the proof 
below will use a construction by Cauchy sequences that is immediately suggested 
if X is the set of rationals and X* is the set of reals. 


PROOF. Let Cauchy(X) be the set of all Cauchy sequences in X. Define a 
relation ~ on X as follows: if {p,} and {g,} are in Cauchy(X), then {p,} ~ {gn} 
means lim d(py, gn) = 0. 

Let us prove that ~ is an equivalence relation. It is reflexive, i.e., has {p,} ~ 
{Pn}, because d(pn, Pn) = 0 for all n. It is symmetric, i.e., has the property that 
{Pn} ~ {dn} implies {gn} ~ {pn}, because d(pn, Gn) = d(Gn, Pn). It is transitive, 
1.e., has the property that {pn} ~ {qn} and {gn} ~ {rn} together imply {pn} ~ {rn}, 
because 

O< d(Pn, rn) < d(Pn, dn) + dn, rn) 


and each term on the right side is tending to 0. Thus ~ is an equivalence relation. 
Let X* be the set of equivalence classes. If P and Q are two equivalence 
classes, we set 


A(P, Q) = limd (pn, Gn), (*) 


where {p,} is amember of the class P and {g,} is a member of the class Q. We 
have to prove that the limit in (*) exists in R and then that the limit is independent 
of the choice of representatives of P and Q. 

For the existence of the limit (), it is enough to prove that the sequence 
{d(Pn, Gn)} is Cauchy. The triangle inequality gives 


d(Pns Gn) < A(Pas Pm) + d(Pins Im) + A(Gms Qn) 


and hence d(Pn, dn) — d(Pms dm) < d(Pns Pm) + (Gm, dn). Reversing the roles 
of m and n, we obtain 


|\d(Pn, Qn) ~ d(Pm; am) | < d(Pn; Pm) ae d(qn, Qn): 


The two terms on the right side tend to 0, since {px} and {g,} are Cauchy, and 
hence {d(Pn, gn)} is Cauchy. Thus the limit («) exists. 

We have also to show that the limit (*) is independent of the choice of repre- 
sentatives. Let {p,} and {pj,} be in P, and let {g,} and {g/,} be in Q. Then 


d(Pn, qn) < (pn, Pi.) + a(p),, g,) + 4G}, dn). 


Since the first and third terms on the right side tend to 0 and the other terms in 
the inequality have limits, we obtain lim, d(pn, dn) < limn d(pj,, q/,). Revers- 
ing the roles of the primed and unprimed symbols, we obtain limd(p’,, g/,) < 
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limd(pn,qn). Therefore limd(p,, qn) = limd(pj,,q,), and A(P, Q) is well 
defined. 

Let us see that (X*, A) is a metric space. Certainly A(P, P) = 0 and 
A(P, Q) = A(Q, P). To prove the triangle inequality 


A(P, Q) < A(P, R) + A(R, Q), (7) 
let {p,} be in P, {g,} be in Q, and {r,} be in R. Since 


d(Pn, Gn) < d(Pn, In) + dn, Gn); 


we obtain («*) by passing to the limit. Finally if two unequal classes P and Q 
are given, and if {p,} and {g,} are representatives, then limd(pn, gn) 4 0 by 
definition of ~. Therefore A(P, Q) > 0. Thus (X%*, A) is a metric space. 

Now we can define the isometry g : X — X™*. If x is in X, then g(x) is the 
equivalence class of the constant sequence {p,} in which p, = x for alln. To 
see that ¢g is an isometry, let x and y be in X, let p, = x forall n, and let gn = y 
for alln. Then A(g(x), g(y)) = limd(pn, dn) = limd(x, y) = d(x, y), and 
is an isometry. 

Let us prove that g(X) is dense in X*. In fact, if P is in X* and {pp} is 
a representative, we show that y(p,) > P. If g(pn) = P for all sufficiently 
large n, then P is in g(X); otherwise this limit relation will exhibit P as a limit 
point of y(X), and we can conclude that P is in g(X in any case. In other 
words, ¢(pn) — P implies that g(X) is dense. To prove that we actually do 
have g(p,) > P, let € > 0 be given. Choose N such that k > m > N implies 
d(Pm, De) < €. Then A(g(pm), P) = limy d(pm, pp) < € form > N. Hence 
lim,, A(@(pm), P) = 0 as required. 

Finally let us prove that X* is complete by showing directly that any Cauchy 
sequence { P,,} converges. Since y(X) is dense in X*, we can choose x, € X with 
A(@(Xn), Pn) < 1/n. First let us prove that {x,} is Cauchy in X. Let € > 0 be 
given, and choose N large enough so that A(P,,, P,”) < €/3 when n and n’ are 
> N. Possibly by taking WN still larger, we may assume that 1/N < €/3. Then 
whenever 7 and n’ are > N, we have 


d(Xn, Xn’) = A(Q(Xn), P(Xn')) 
< A(@(n), Pn) + ACPa, Pr) + A(Pr’, @%n’)) 


li vel 1 vel €.-Ps Sot Se 
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Thus {x,} is Cauchy in X. Let P € X™* be the equivalence class to which {x;} 
belongs. We prove completeness by showing that P, > P. Let € > 0 be given, 
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and choose N large enough so thatr > n > N implies d(x, x,) < €/2. Possibly 


by taking AN still larger, we may assume that x < 


€ 


ot Then r > n > N implies 


A(Pa, P) < A(Pus Pn) + A(@Cn), P) < E+ limd(%n, x7) < $+ $= 6, 


Thus P, — P. Hence every Cauchy sequence in X* converges, and X* is 
complete. 


An important application of Theorem 2.60 for algebraic number theory is 
to the construction of the p-adic numbers, p being prime. The metric space 
that is completed is the set of rationals with a certain nonstandard metric. This 
application appears in Problems 27—31 at the end of this chapter. 
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As in Example 9 of Section 1, let S be a nonempty set, fix an integer n > 0, and 
let X be the set of n-tuples of members of S. For n-tuples x = (x1,...,X,) and 
y=(1,---, Yn), define d(x, y) = #{j | x; 4 y;}, the number of components in 
which x and y differ. Prove that d satisfies the triangle inequality, so that (X, d) 
is a metric space. 


Prove that a separable metric space is the disjoint union of an open set that is at 
most countable and a closed set in which every point is a limit point. 


Give an example of a function f : [0,1] — R for which the graph of f, given 
by {(, f(x) | O<x< 1}, is aclosed subset of R? and yet f is not continuous. 


If A is a dense subset of a metric space (X, d) and U is open in X, prove that 
vemnuy, 


Let (X, d) be a metric space, let U be an open set, and let Ej > FE, D--- bea 

decreasing sequence of closed bounded sets with ipsa E, CU. 

(a) For X equal to R", show that Ey C U for some N. 

(b) For X equal to the subspace Q of rationals in R!, give an example to show 
that Ey C U can fail for every N. 


Let F : X x Y — Z bea function from the product of two metric spaces into a 

metric space. 

(a) Suppose that (x, y) F(x, y) is continuous and that Y is compact. Prove 
that F(x, -) tends to F (xo, -) uniformly on Y as x tends to xo. 

(b) Conversely suppose +> F(x, y) is continuous except possibly at points 
(x,y) = (xo, y), and suppose that F(x,-) — F(xo,-) uniformly. Prove 
that F is continuous everywhere. 
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7. Give an example of a continuous function between two metric spaces that fails 
to carry some Cauchy sequence to a Cauchy sequence. 


8. (Contraction mapping principle) Let (X, d) be a complete metric space, let 
r be anumber withO <r < 1,andlet f : X — X bea contraction mapping, 
i.e.,a function such that d(f (x), f(y)) < rd(x, y) forall x and y in X. Prove 
that there exists a unique xo in X such that f (xo) = xo. 


9. Prove that a countable complete metric space has an isolated point. 


10. A metric space (X, d) is called locally connected if each point has arbitrarily 
small open neighborhoods that are connected. Let C be a Cantor set in [0, 1], as 
described in Section 9, and let X C R? be the union of the three sets C x [0, 1], 
[0, 1] x {0}, and [0, 1] x {1}. Prove that X is compact and connected but is not 
locally connected. 


Problems 11-13 concern the relationship between connected and pathwise connected. 
It was observed in Section 8 that pathwise connected implies connected. A metric 
space is called locally pathwise connected if each point has arbitrarily small open 
neighborhoods that are pathwise connected. 


11. Prove that a metric space (X, d) that is connected and locally pathwise connected 
is pathwise connected. 


12. Deduce from the previous problem that for an open subset of IR”, connected 
implies pathwise connected. 


13. Prove that any open subset of R! is uniquely the disjoint union of open intervals. 


Problems 14-17 concern almost periodic functions. Let f : R! —> C be a bounded 
uniformly continuous function. If ¢ > 0, an € almost period for f is anumber ¢ such 
that | f(x +1) — f(x)| < € forall real x. A subset E of R! is called relatively dense 
if there is some L > 0 such that any interval of length > L contains a member of E. 
The function f is Bohr almost periodic if for every € > 0, its set of € almost periods 
is relatively dense. The function f is Bochner almost periodic if every sequence of 
translates {f,,}, where f;(x) = f(x +f), has a uniformly convergent subsequence. 
Any function x +> e' with c real is an example. 


14. As usual, let B(R!, C) be the metric space of bounded complex-valued functions 
on R! in the uniform metric. Show that the subspace of bounded uniformly 
continuous functions is closed, hence complete. 


15. Show that a bounded uniformly continuous function f : R! — C is Bohr almost 
periodic if and only if the set { Ii | te R'} is totally bounded in B(R!, C). 


16. Prove that a bounded uniformly continuous function f : R! — C is Bohr almost 
periodic if and only if it is Bochner almost periodic. Thus the names Bohr and 
Bochner can be dropped. 
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17. Prove that the set of almost periodic functions on R! is an algebra closed under 
complex conjugation and containing the constants. Prove also that it is closed 
under uniform limits. 


Problems 18-20 concern the special case whose proof precedes that of the Stone— 

Weierstrass Theorem (Theorem 2.58). In the text in Section 10, this preliminary 

special case was the function |x| on [—1, 1], and it was handled in two ways—in 

Section I.8 by the binomial expansion and Abel’s Theorem and in Section I.9 as a 

special case of the Weierstrass Approximation Theorem. The problems in the present 

group handle an alternative preliminary special case, the function ./x on [0, 1]. This 

is just as good because |x| = Vx?, 

18. (Dini’s Theorem) Let X be acompact metric space. Suppose that f, : X > R 
is continuous, that f; < fo < fs < ---, and that f(x) = lim f,(x) is 
continuous and is nowhere +oo. Use the defining property of compactness 
to prove that f,, converges to f uniformly on X. 

19. Define a sequence of polynomial functions P,, : [0,1] ~ R by Po(x) = 0 and 
Pro i(x) = Py(x) + 5(¢ — Pr(x)?). Prove thatO = Py < Pi < Pp <--- < 
/x < land that lim, P,(x) = /x for all x in [0, 1]. 

20. Combine the previous two problems to prove that ./x is the uniform limit of 
polynomial functions on [0, 1]. 


Problems 21—24 concern the effect of removing from the Stone—Weierstrass Theorem 

(Theorem 2.58) the hypothesis that the given algebra contains the constants. Let (S, d) 

be acompact metric space, and let A be a subalgebra of C (S, R) that separates points. 

There can be no pair of points {x, y} such that all members of A vanish at x and y. 

21. If for each s € S, there is some member of A that is nonzero at s, prove in the 
following way that A is still dense in C(S, R): Observe that the only place in 
the proof of Theorem 2.58a that the presence of constant functions is used is in 
the construction of the function /, in the third paragraph. Show that a function 
hy still exists in A°! with hy(x) = f(x) and hy(y) = f(y) under the weaker 
hypothesis that for each s € S, there is some member of A that is nonzero at s. 

22. Suppose that the members of A all vanish at some so in S. Let 6 = A+RI, 
so that Theorem 2.58a applies to B. Use the linear function L : C(S,R) > R 
given by L(f) = f (so), together with the fact that B“! = C(S, R), to prove that 
A is uniformly dense in the subalgebra of all members of C(S, R) that vanish 
at So. 

23. Adapt the above arguments to prove corresponding results about the algebra 
C(S, C) of complex-valued continuous functions. 

24. Let Cy([0, +00), IR) be the algebra of continuous functions from [0, +00) into 
R that have limit 0 at +-oo. 
(a) Prove that the set of all finite linear combinations of functions e~”* for 

positive integers n is dense in Cp([0, +00), R). 
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(b) Suppose that f is in Cp([0, +00), R), that f(x) = 0 for x > b, and that 
ii f(x)e~"™ dx = 0 for all integers n > 0. Prove that f is the 0 function. 


Problems 25—26 concern completions of a metric space. They use the notation of 
Theorem 2.60. The first problem says that the completion is essentially unique, and 
the second problem addresses the question of what happens if the original space is 
already complete; in particular it shows that the completion of the completion is the 
completion. 


25. Suppose that (X, d) is a metric space, that (X*, A,) and (X35, A») are complete 
metric spaces, and that gy : X — Xj and g2 : X — X; are isometries such that 
y(X) is dense in Xj and g2(X) is dense in X3. Prove that there exists a unique 
isometry y of X7 onto X5 such that g = yo g. 


26. Prove that a metric space X is complete if and only if X* = X,i-e., if and only 
if the standard isometry y of X into its completion X* is onto. 


Problems 27-31 concern the field Q, of p-adic numbers. The problems assume 
knowledge of unique factorization for the integers; the last problem in addition 
assumes knowledge of rings, ideals, and quotient rings. Let Q be the set of rational 
numbers with their usual arithmetic, and fix a prime number p. Each nonzero rational 
number r can be written, via unique factorization of integers, as r = mp*/n with p 
not dividing m or n and with k a well-defined integer (positive, negative, or zero). 
Define |r|, = p-*. For r = 0, define |0|, = 0. The function | - |, plays a role in 
the relationship between Q and Q, similar to the role played by absolute value in the 
relationship between Q and R. 


27. Prove that | - |, on Q satisfies (i) |r|, = O with equality if and only ifr = 0, 


Gi) | —rlp = Irlp, Gli) Irslp = Irlplslp,and (iv) |r + s|p < max{|rlp, |s|p}- 
Property (iv) is called the ultrametric inequality. 


28. Show that (Q, d) is a metric space under the definition d(r, s) = |r — s|p. 


29. Let (Q,,d) be the completion of the metric space (Q,d). Since |r|, can be 
recovered from the metric by |r|, = d(r,0), the function | - |, extends to a 
continuous function | - |, :Q, > R. 

(a) Using Proposition 2.47, show that addition, as a function from Q x QtoQ,, 
extends to a continuous function from Q, x Q, to Q,. Argue similarly that 
the operation of passing to the negative, as a function from Q to Q,, extends 
to a continuous function from Q, to Q,. Then prove that Q, is an abelian 
group under addition. 

(b) Show that multiplication, as a function from Q x Q to Q,, extends to a 
continuous function from Q, x Q, to Q,. (This part is subtler than (a) 
because multiplication is not uniformly continuous as a function of two 
variables.) 
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(c) Let Q* = Q-— {0} and Q@ = Q, — {0}. Show that the operation of 
taking the reciprocal, as a function from Q~ to Q , extends to a continuous 
function from Q> to itself. Then prove that Q; is an abelian group under 
multiplication. 

(d) Complete the proof that Q, is a field by establishing the distributive law 
t(r +s) =tr +ts within Q,. 

30. (a) Prove that the subset {t €Q, | ItlD < 1} of Q, is totally bounded. 
(b) Prove that a subset of Q, is compact if and only if it is closed and bounded. 
31. Prove that the subset Z, of Q, with |x|, < 1 is acommutative ring with identity, 

that the subset P with |x|, < p' is an ideal in Z,, and that the quotient Z,,/P 

is a field of p elements. 


CHAPTER III 


Theory of Calculus in Several Real Variables 


Abstract. This chapter gives a rigorous treatment of parts of the calculus of several variables. 

Sections 1-3 handle the more elementary parts of the differential calculus. Section | introduces an 
operator norm that makes the space of linear functions from R” to R” or from C” to C” into a metric 
space. Section 2 goes through the definitions and elementary facts about differentiation in several 
variables in terms of linear transformations and matrices. The chain rule and Taylor’s Theorem with 
integral remainder are two of the results of the section. Section 3 supplements Section 2 in order to 
allow vector-valued and complex-valued extensions of all the results. 

Sections 4—5 are digressions. The material in these sections uses the techniques of the present 
chapter but is not needed until later. Section 4 develops the exponential function on complex square 
matrices and establishes its properties; it will be applied in Chapter IV. Section 5 establishes the 
existence of partitions of unity in Euclidean space; this result will be applied at the end of Section 10. 

Section 6 returns to the development in Section 2 and proves two important theorems about 
differential calculus. The Inverse Function Theorem gives sufficient conditions under which a 
differentiable function from an open set in R” into R” has a locally defined differentiable inverse, 
and the Implicit Function Theorem gives sufficient conditions for the local solvability of m nonlinear 
equations in n + m variables for m of the variables in terms of the other n. The Inverse Function 
Theorem is proved on its own, and the Implicit Function Theorem is derived from it. 

Sections 7-10 treat Riemann integration in several variables. Elementary properties analogous 
to those in the one-variable case are in Section 7, a useful necessary and sufficient condition for 
Riemann integrability is established in Section 8, Fubini’s Theorem for interchanging the order of 
integration is in Section 9, and a preliminary change-of-variables theorem for multiple integrals is 
in Section 10. 


1. Operator Norm 


This section works with linear functions from n-dimensional column-vector space 
to m-dimensional column-vector space. It will have applications within this 
chapter both when the scalars are real and when the scalars are complex. To 
be neutral let us therefore write F for R or C. Material on the correspondence 
between linear functions and matrices may be found in Section A7 of the appendix. 

Specifically let LCF”, F”) be the vector space of all linear functions from F” 
into F”. This space corresponds to the vector space of m-by-n matrices with 
entries in F, as follows: In the notation in Section A7 of the appendix, we let 
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(€,...,@,) be the standard ordered basis of F”, and (u,,..., u) the standard 
ordered basis of F”. We define a dot product in F” by 


(a1, ...,m) ‘ (b1, ..+, Dm) = ab, t++++anbm 
with no complex conjugations involved. The correspondence of a linear function 
T in L(F", F”) to a matrix A with entries in F is then given by Aj; = T(e;) - uj. 
Let | - | denote the Euclidean norm on F” or F”, given as in Section II.1 by 
the square root of the sum of the absolute values squared of the entries. The 


Euclidean norm makes F” and F” into metric spaces, the distance between two 
points being the Euclidean norm of the difference. 


Proposition 3.1. If T is a member of the space L(F”, F”) of linear functions 
from F” to F”, then there exists a finite M such that |T(x)| < M|x| for all x in 
F”. Consequently T is uniformly continuous on F”. 


PROOF. Each x in F” has x = ia (x - e;)e;, and linearity gives T(x) = 
pa (x - e;)T (e;). Thus 


|T(x)| = | “Ge : eT Ce) = = IT (e;)||x - ejl. 
j=l i 


The expression x - e; is just the j® entry of x, and hence |x - e;| < |x|. Therefore 
|T(x)| < (Sat IT (eI) xl, and the first conclusion has been proved with 
M = ¥*;_, |T(e;)|. Replacing x by x — y gives 

IT (x) —T(y)| =|T@ — y)| < M|x — yl, 


and uniform continuity of T follows with 6 = €/M. 


Let T be in L(F”, F”). Using Proposition 3.1, we define the operator norm 
||T || of T to be the nonnegative number 


7 || = inf {M | |T(x)| < M|x| for all x € F’}. 
xe" 


Then IT (x)| < IT II |x| for all x € F”. 


Since |T (cx)| = |c||x| for any scalar c, the inequality |T(x)| < M|x| holds for 
all x # 0 if and only if it holds for all x with O < |x| < 1, if and only if it 
holds for all x with |x| = 1. Also, we have T (0) = O. It follows that two other 
expressions for ||7 || are 


||| = sup |T@&)| = sup |T@)]. 


|x|<1 |x|=1 
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Proposition 3.2. The operator norm on L(F", F”) satisfies 

(a) ||7|| = 0 with equality if and only if T = 0, 

(b) lleT || = |e] ||7'|| for c in F, 

(cc) IT +S] < ITI+ ISI, 

(d) ||TS|| < ||T|||| S|] if S is in LE", F”) and T is in L(F”, F*), 
(e) |{1|| = 1 if nm = m and 1 denotes the identity function on F”. 


PROOF. All the properties but (d) are immediate. For (d), we have 
(TS)(x)| = |T(S@))| < ITH IS@)| < WIT UST I. 
Taking the supremum for |x| < 1 yields ||7'S|| < ||T|||| S|]. 


Corollary 3.3. The space L(F”, F”) becomes a metric space when a metric 
d is defined by d(T, S) = ||T — S|. 


PROOF. Conclusion (a) of Proposition 3.2 shows that d(T, S) > 0 with equality 
if andonly if T = S,conclusion (b) shows thatd(T, S) = d(S, T),andconclusion 
(c) yields the triangle inequality because substitution of T = T’ — V’ and S = 
V’ — U’' into (c) yields d(T’, U’) < d(T’, V') + d(V', U'). 


Suppose that F = C. If the matrix A that corresponds to some T in L(C”, C”) 
has real entries, we can regard T as a member of L(R”, R’), as well as a member 
of L(C”, C”). Two different definitions of ||7' || are in force. Let us check that 
they yield the same value for ||7'|]. 


Proposition 3.4. Let T be in L(C”, C”), and suppose that the vector T (e;) 
lies in R” for 1 < j <n. Then T carries R” into R”, and ||T|| is consistently 
defined in the sense that 


IT||= sup |T@)|= sup |T@)I. 
xeER", |x|<1 zeC”, |z|<1 
PROOF. The first conclusion follows since T is R linear. For the second 
conclusion, let ||7'||, and ||7'||, be the middle and right expressions, respectively, 
in the displayed equation above. Certainly we have ||T'||, < ||T lc. Ifz isin Ch, 
write z = x + iy with x and y in R”. Since T(x) and T(y) are in R” and T is C 
linear, 


IT)? = |T(x) HiT = ITO)? + ITO)? 
< (IT gle)? + (UIT IglyP?) = ITI x? + IP) = IT IE Il. 


Hence |T(z)| < ||T|lglz|, and it follows that ||T ||, < ||T|lp. The second 
conclusion follows. 
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We shall encounter limits of linear functions in the metric d given in Corollary 
3.3, and it is worth knowing just what these limits mean. For this purpose, let T 
be in L(F”, F”), and define the Hilbert—Schmidt norm of T to be 


ni=(So repr)”. 
j=l 


This quantity has an interpretation in terms of the m-by-n matrix A that is asso- 
ciated to the linear function T by the above formula A;; = T(e;) - u;. Namely, 
|T| equals (>: lA) which is just the Euclidean norm of the matrix A 
if we think of A as lying in F””. This correspondence provides the license for 
using the notation of a Euclidean norm for the Hilbert-Schmidt norm of T. The 
Hilbert-Schmidt norm has the same three properties as the operator norm that 
allow us to use it to define a metric: 
(i) |T| = O with equality if and only if T = 0, 

(ii) |cT| = |c||T| forc in F, 

Gui) |T + S| < |T|+|S|. 
Let us write d)(T, S) = |T — S| for the associated metric. Parenthetically we 
might mention that the analogs of (d) and (e) for the Hilbert-Schmidt norm are 

(iv) |TS| < |T||S| if S isin L(F", F”) and T is in L(F”, F*), 

(v) |1| =./n ifn =m and 1 denotes the identity function on F”. 
We shall have no need for these last two properties, and their proofs are left to be 
done in Problem 1 at the end of the chapter. 


Proposition 3.5. The operator norm and Hilbert—Schmidt norm on L(F”, F”) 
are related by 
IT <IT| < Va\ITI. 


Consequently the associated metrics are related by 
d<d)<Jnd. 


PRoor. If |x| < 1, then the triangle inequality and the classical Schwarz 
inequality of Section A5 give 


IT()| = 


Ye -eTe)| sO be -glITEI 


j=l j=l 


<(o bee) (So repr)” =s1(Yo ree)” sir 


j=l j=l j=l 
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Taking the supremum over x yields ||7'|| < |7|. In addition, 
n n 
IT? => Te)? <> IT Wiel? =allT 2, 
j=l j=l 


and the second asserted inequality follows. 


Proposition 3.5 implies that the identity map between the two metric spaces 
(L(F", F”) , d) and (L(F", F’”), dz) is uniformly continuous and has a uniformly 
continuous inverse. Therefore open sets, convergent sequences, and even Cauchy 
sequences are the same in the two metrics. Briefly said, convergence in the 
operator norm means entry-by-entry convergence of the associated matrices, and 
similarly for Cauchy sequences. 


2. Nonlinear Functions and Differentiation 


We begin a discussion of more general functions between Euclidean spaces by 
defining the multivariable derivative for such a function and giving conditions 


for its existence. Let E be an open set in R”, and let f : E — R” bea 
Ai@) 

function. We can write f(x) = ( i where f;(x) = f(x)-u;. Then 
Sin (x) 

f(x) = or. fi(x)u;. The functions f; : E > R are called the components of 


f . The associated partial derivatives are given by 


d 
j= Thi + te;)| 9: 


We say that f is differentiable at x in E if there is some T in L(R”, R”) with 


fm Leth = f@)-FH 
im = 
h>0 |h| 


0. 


The linear function T is unique if it exists. In fact, if 7; and 7 both serve as 
T in this limit relation, then we write 


Ty(h) — Ty(h) = (f(@ +h) — f@) —- TA) — (f@ +A — f@) - h)) 
and find that 
|T\(h) — To(h)| ts If@+h)—-fM-TNM|  |If@+h—-f@)-h®l 


Ih 7 |h| |h| 


— 0. 
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If T; 4 T>, choose some v € R” with |v| = 1 and T;(v) 4 T)(v). As a nonzero 
real parameter ¢ tends to 0, we must have 
|T\(v) — T(v)| 
= [tol |(f@ + tv) — fe) — Ti(tv)) — (F@ + tv) — fF) — Ti(tv))| 


— 0. 


Since ¢ does not appear on the left side but the right side tends to 0, the result is a 
contradiction. Thus 7; = 7), and T is unique in the definition of “differentiable.” 

If T exists, we write f’(x) for it and call f’(x) the derivative of f at x. 
If f is differentiable at every point x in E, then x /% f'(x) defines a function 
f': E > LR", R”). We deal with the differentiability of this function presently. 

A differentiable function is necessarily continuous. In fact, differentiability at 
x implies that | f(x +h) — f(x) — T(h)| > 0ash — 0. Since T is continuous, 
T(h) > Oalso. Thus f(x +h) > f(x), and f is continuous at x. 


Proposition 3.6. Let E be an open set of R”, and let f : E — R” bea 


af, 
function. If f’(x) exists, then oh ox) exists for all i and j, and 
xj 


Ofi 


Ox; 


(x) = f(x) (ej) ui. 


REMARKS. In other words, if f’(x) exists at some point x, then it has to be 


of 

the linear function whose matrix is [oH (x) | . This matrix is called the Jacobian 
xj 

matrix of f atx. 


PROOF. We are given that 


im LOMA LO- LOM! _ 
im = 
h—>0 |h| 


0. 


Dot product with a particular vector is continuous by Proposition 3.1. Take 
h = te; with ¢ real in the displayed equation, and form the dot product with u;. 
Then we obtain 


hi lfie + te;) — fi) — tf’ a)(ej) wil 
150 id] 7 


0. 


The result follows. 
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The natural converse to Proposition 3.6 is false: the first partial derivatives of 
a function may all exist at a point, and it can still happen that f is discontinuous. 

If f’(x) exists at all points of the open set E in R”, then we obtain a function 
f' : E = L(R",R”), and we have seen that we can regard L(R", R”) as a 
Euclidean space by means of the Hilbert-Schmidt norm. Let us examine what 
continuity of f’ means and then what differentiability of f’ means. 


Theorem 3.7. Let E be an open set of R”, and let f : E — R” be a function. 
If f’(x) exists for all x in E and x + f(x) is continuous at some xo, then 


Xb a) is continuous at xo for all i and j. Conversely if each oh (x) exists 
Xj xj 
ofi 


Ox; 


at every point of E and is continuous at a point xo, then f’(xo) exists. If all 


are continuous on E, then x > f’(x) is continuous on E. 


ae.) eee 
PROOF OF DIRECT PART. The partial derivative et (x) is one of the entries of 
zr: 


f'(x), regarded as a matrix, and has to be continuous if f’(x) is continuous. 


PROOF OF CONVERSE PART. For the moment, let x be fixed. Regard h as 
(A1,..-, Am), and for 1 < j <n, puth) = (A1,...,h;,0,...,0). Define T 


Ofi 
to be the member of L(R”, R”) with matrix [Ac]. Use of the Mean Value 
Xj 


Theorem gives 


[f(x +h) — fal = DULP@ +A) — fae taO Py]; 


j=l 


n 
d - ; 
— Sony Ai fix { hv 1) } thje;)|,_., with 0 < tij <1 
j=l 


n 


Of; 
= ony Fe ened 4 tijhjej) 
j=l Ox; 


GS, Of Ol Loe ee 
= 2h (Leas (x 4 h T tijhje;) ax; (x) | 


Ox; 
and hence 
[f@+h)—f@)-Tw Ohh gy... oy. Of 
ial = [Al E Ree area 55h 
Consequently 
If@+h-f@-TO Woy |. . pgp. oy OF 
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Let € > O be given, and recall that the partial derivatives are assumed to be 
continuous at xo. If 6 > 0 is chosen such that |h| < 6 implies 


Of; Of; 
J (xo + A) u 
Ox; 


€ 
(xo)| < —, 
Ox; mn 


then we see that || < 6 implies 


Ifo +h) — f(%o) — T)| 
ih| <€ 


Thus f’(xo) exists. 
Now assume that all the partial derivatives are continuous on E. Since 


Ofi 
L(R", R”) is identified with R””, the continuity of the entries oh) of the 
a 


matrix of f’(x) implies the continuity of f’(x) itself. This completes the proof. 


If x +> f’(x) is continuous on E, we say that f is of class C! on E orisaC! 
function on E. Let us iterate the above construction: Suppose that E is open in 
IR” and that f : E > R” is of class C!, so that x + f’(x) is continuous from E 
into L(R”, R”). We introduce second partial derivatives of f and the derivative 
of f’. Namely, define 


07 fi ee (2 fi ) 
OX, OX; - OX, Ox; , 
’ of : 
Since the entries of the matrix of f’(x) are F (x) = f'(«)e; - uj, the expression 
xj 
Des ieee Mees ol 
ae is the partial derivative with respect to x; of an entry of the matrix of 
KOA] 


f(x). Thus we can say that f is of class C* from E into R” if f’(x) is of class 
C!,and so on. We say that f is of class C® or is a CC® function if it is of class 
Cé for all k. A C®™ function is also said to be smooth. We write C‘(E) and 
C™(E) for the sets of C* functions and C® functions on E. 


Corollary 3.8. Let E be an open set of IR”, and let f : E — R” bea function. 
The function f is of class C* on E if and only if all /"-order partial derivatives 
of each f; exist and are continuous on E for! < k. 


This is immediate from Theorem 3.7 and the intervening definitions. The 
definition of a second partial derivative was given in a careful way that stresses 
the order in which the partial derivatives are to be computed. Reversing the order 
of two partial derivatives is a problem involving an interchange of limits. In 
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addressing sufficient conditions for this interchange to be valid, it is enough to 
consider a function of two variables, since n — 2 variables will remain fixed when 
we consider a mixed second partial derivative. The different components of the 
function do not interfere with each other for these purposes, and thus we may 
assume that the range is R!. 


Proposition 3.9. Let E be an open set in R*. Suppose that f : E > R! is 


af a a? a? 
f of and f exist in E and f 
dyox 


a function such that is continuous at 


ax’ dy’ dyax 
arf ; af 
(x, y) = (a, b). Then (a, b) exists and equals (a, b). 
Ox dy dyax 
PROOF. Put 
h t k t h, ’ 9 
Ath, = 22 .b+k)— fla b) — fla,b+k) + fla eh 


hk 


and let u(t) = f(t,b oh k) — f (t, b). The function u is a function of one variable 
t whose derivative is af (t,b +h) of (t, b). Use of the Mean Value Theorem 
produces € between a and a + h, as well as n between b and b + k, such that 


u(a+h)—u(a) _ u'(&) 
hk Ok 


HE b+H-LEb) af 
a k ~ Oyax 


A(h, k) = 


(*) 


(§,). 


Let € > 0 be given. By the assumed continuity of a°f /dyax at (a, b), choose 
5 > O such that |(h, k)| < 6 implies 


2 2 
oF Mik GD of 
dyox OyOx 


(a, b)| <€. 


Then () shows that |(h, k)| < 6 implies 


af 
dyox 


Ah, &) = (a,b) <€. 


Letting k tend to 0 shows, for |h| < 6/2, that 


Fia+h,b)— (a,b) 92 
ay Kear ay? ‘i 
| A an?) <€. 


2 


: anf af 
Since € is arbitrary, Bxay b) exists and equals dyax b). 
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Now that the order of partial derivatives up through order k can be interchanged 

arbitrarily in the case of a scalar-valued C* function, we can introduce the usual 
k 

; af 7, to indicate the result of differentiating f a total of k 
OX,) +++ OXn" 
times, namely k; times with respect to x), etc., through k, times with respect 
to x,. Simpler notation will be introduced later to indicate such iterated partial 
derivatives. 


notation 


Theorem 3.10 (chain rule). Let E be an open set in R”, and let f : E > R” 
be a function differentiable at a point x in E. Suppose that g is a function with 
range IR‘ whose domain contains f (£) and is a neighborhood of f (x). Suppose 
further that g is differentiable at f(x). Then the composition go f : E > R* is 
differentiable at x, and (g 0 f)/(x) = g'(f (x)) f’(x). 


PROOF. With x fixed, define y = f(x), T = f'(x), S = g'(y), and also 
u(h) = fix~+h)— f(x)—Th) and v(k)=g(y +k) — g(y) — SKK). 
Continuity of f at x and of g at y implies that 
Ju(h)| = e(h)|h| and |v(k)| = (k)|k| 


with e(h) tending to 0 as h tends to 0 and with (k) tending to 0 as k tends to 0. 
Given h £0, putk = f(x +h) — f(x). Then 


|k| = |T(A) + u(h)| S [IT |] + eC) IA (x) 
and 
a(f(x +h)) — g(f@)) — ST)(A) = gv +k) — a(y) — S(T (A) 
= v(k) + S(k) — S(T (h)) 
= S(k — T(h)) + v(k) 
= S(u(h)) + vk). 
Therefore 


In| 'e(f & +A)) — g(f(@)) — (ST)(A)| < SI lw) |/1A| + [V@)I/1A| 
< Sle) + n(K)IkI/|h| 
< Sle@) + n@UITI + eM], 


the last inequality following from the upper bound obtained in (*) for |k|. As h 
tends to 0, k tends to 0, by that same bound. Thus e(h) and n(k) tend to 0. The 
theorem follows. 
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Let us clarify in the context of a simple example how the notation in Theorem 
3.10 corresponds to the traditional notation for the chain rule. Let f and g be 
given by 


(s)=1(5)= (C288) we cw e(s)at 


In traditional notation one of the partial derivatives of the composite function is 
computed by starting from 


dz Oz 0 dz 0 
a le f y — 2x cosd — 2y sind 
dr oOxor  dyor 


and then substituting for x and y in terms of r and @. In notation closer to that of 
the theorem, we replace derivatives by Jacobian matrices and obtain 


af afi 

(en me) a °8 ) dr 00 
ar 06 dx dy E20) af, afr 
dr a0 


cos6 —rsind 


— AER ey) mere ( sin 8 r COS ) : 


The formula above for 0z/dr is just the first entry of this matrix equation. 

The chain rule in several variables is a much more powerful result than its 
one-variable prototype, permitting one to handle differentiations when a partic- 
ular variable occurs in several different ways within a function. For example, 
consider the rule for differentiating a product in one-variable calculus. The 
function x +» f(x)g(x) can be regarded as a composition if we recognize that 
one of the ingredients is the multiplication function from R* to R!. Thus let 


u = f(x) and v = g(x). If we define F(x) = Co) and G (“) = uv, then 


8 (x) 
Gy 
(*)=F@) \ 3'@) 


(Go F)(x) = f(x)g(x). Theorem 3.10 therefore gives 
=(g) f(@)) @ 2) = g(x) f(x) + fO)g'). 


d _ (9G AG f'(x) 2 
Fy (Go Fa) = (= ae u) 
g'(x) 
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Theorem 3.11 (Taylor’s Theorem). Let N be an integer > 0, and let E be an 
open set in R”. Suppose that F : E — R! is a function of class C‘*t! on E and 
that the line segment from x = (x1,...,x,) tox +h, where h = (h1,...,hy), 
lies in E. Then 


uf a* F(x) 
Fe+h=Fo)+ >> DD Gil ee 
K=1 ky+--+hi=K, Oxy eX 
all kj>0 
NAT cs Pat i ON+1 F(x + sh) 
+ ate fa s)% i as. 
t--+h=N+1, 1- n- Ox, -9Xy 
all 1;>0 


PROOF. Define a function f of one variable by f(t) = F(x + th). Taylor’s 
Theorem in one variable (Theorem 1.36) gives 


N t 
FO=fOt+ > KY) TP OH + aI / Gyr golds, 
K=1 “70 


and we put ¢ = 1 in this formula. If g(t) = G(x + th), the function g is the 
composition of t x + th followed by G, and the chain rule (Theorem 3.10) 
allows us to compute its derivative as 


hy 


Hoe( un 


Ox] i OXn 


“ dG 
: = hj——(x + th). 
xtth \'p j=l Ox; 


Taking G equal to any of various iterated partial derivatives of F and doing an 
easy induction, we obtain 


K OX F(x + sh) 
a aS € ,,)n ie 


axk 6. xk’ 
kyte-tha=K, 1 n 
all kj>0 


where Ge ae is the multinomial coefficient op AI Substitution of this 


expression into the one-variable expansion with t = 1 yields the theorem. 
3. Vector-Valued Partial Derivatives and Riemann Integrals 


It is useful to extend the results of Section 2 so that they become valid for functions 
f : E — C”, where E is an open set in R”. Up to the chain rule in Theorem 
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3.10, these extensions are consequences of what has been proved in Section 2 
if we identify C’” with R?”. Achieving the extensions by this identification is 
preferable to trying to modify the original proofs because of the use of the Mean 
Value Theorem in the proofs of Theorem 3.7 and Proposition 3.9. 

The chain rule extends in the same fashion, once we specify what kinds of 
functions are to be involved in the composition. We always want the domain to 
be a subset of some R’, and thus in a composition g o f, we can allow g to have 
values in some C*, but we insist as in Theorem 3.10 that f have values in R”. 

Now let us turn our attention to Taylor’s Theorem as in Theorem 3.11. The 
statement of Theorem 3.11 allows R! as range but not a general R”. Thus the 
above extension procedure is not immediately applicable. However, if we allow 
the given F to take values in R” , a vector-valued version of Taylor’s Theorem will 
be valid if we adapt our definitions so that the formula remains true component by 
component. For this purpose we need to enlarge two definitions —that of partial 
derivatives of any order and that of 1-dimensional Riemann integration—so that 
both can operate on vector-valued functions. There is no difficulty in doing so, 
and we may take it that our definitions have been extended in this way. 

In the case of vector-valued partial derivatives, let f : E — R” be given. Then 
or is now defined without passing to components. The entries of this vector- 

Jj 
valued partial derivative are exactly the entries of the j" column of the Jacobian 
matrix of f. Thus the Jacobian matrix consists of the various vector-valued partial 
derivatives of f, lined up as the columns of the matrix. 

Riemann integration is being extended so that the integrand can have values 
in R” or C”, rather than just R!. Among the expected properties of the extended 
version of the Riemann integral, one inequality needs proof because it involves 
interactions among the various components of the function, namely 


b b 
|| FQ) dr z] \F(t)|dt. 


The Riemann integral on the left side is that of a vector-valued function, while 
the one on the right side is that of a real-valued function. To prove this inequality, 
let (-, -) be the usual inner product for the range space—the dot product if the 
range is Euclidean space R” or the usual Hermitian inner product as in Section 
IL.1 if the range is complex Euclidean space C”. If u is any vector in the range 
space with |u| = 1, then linearity gives 


b b 
i F(t)dt, uw) a4 (F(t), u) dt. 
Hence 


b b b b 
F(t)dt, u)| = (F(t),ujdt)< | \(F@,wldts< [ |F@|dt, 
i, elf =f / 
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the two inequalities following from the known scalar-valued version of our in- 
equality and from the Schwarz inequality. If Ais F(t) dt is the O vector, then our 
desired inequality is trivial. Otherwise, we specialize the above computation to 
u= | - F(t) dt | a i F(t) dt, and we obtain our desired inequality. 


4. Exponential of a Matrix 


In Chapter IV, we shall make use of the exponential of a matrix in connection 
with ordinary differential equations. If A is an n-by-n complex matrix, then we 
define 


This definition makes sense, according to the following proposition. 


Proposition 3.12. For any n-by-n complex matrix A, e“ is given by a con- 
vergent series entry by entry. Moreover, the series X +> e* and every partial 
derivative of an entry of it is uniformly convergent on any bounded subset of 
matrix space (= R2"’), and therefore X +> e* is a C™ function. 


REMARK. The proof will be tidier if we use derivatives of n-by-n matrix-valued 
functions. If F and G are two such functions, the same argument as for the usual 
product rule shows that 4 (F(t)G(t)) = F’(t)G(t) + F()G'(t). 


PROOF. Let us define || A|| for an n-by-n matrix A to be the operator norm of the 
member of L(C”, C”) with matrix A. Fix M > 1. On the set where ||Al| < M, 
we have 


— Al <)> SAX s > OIA” = Do OM", 
Now, ™! Non, N! Now, N! Now, ™! 


and the right side tends to 0 uniformly for || A|| < M as N; and N% tend to infinity. 
Hence the series for e“ is uniformly Cauchy in the metric built from the operator 
norm and therefore, by Proposition 3.5, uniformly Cauchy in the metric built from 
the Hilbert-Schmidt norm. Uniformly Cauchy in the latter metric means that the 
series is uniformly Cauchy entry by entry, and hence it is uniformly convergent. 

The matrices that are 1 or 7 in one entry and 0 in all other entries form a 
2n*-member basis over R of the n-by-n complex matrices. Call these matrices by 
the names Ej,1 <j < 2n*. To compute the partial derivative in the E; direction 
of a function f(A), we form < f(A+tE;) | 1-0 We need to estimate the operator 
norm of a succession of partial derivatives applied to a term (N!)~!A% of the 
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exponential series. Thus suppose that we have a product f;(A)--- f(A) with 
each f;(A) equal to A or to a constant, i.e., a matrix that does not depend on A. 
The partial derivative in the E; direction of this product is 


N 
S- fi(A) + fi-1(A(S fi(A + 1B))|,_9) fin (A) fin (A). 
i=1 


Thus we get asum of N terms, each involving a sum of the kind of product we are 
considering. If we repeat this process for partial derivatives in the directions of 
Ej,,..., Ej,,we getasum of N terms, each involving a sum of the kind of product 
we are considering. For a factor E;, Proposition 3.5 gives ||E;|| < |E£;| =1< M. 
For a factor of A, we have || A|| < M. Thus the operator norm of one such product 
is < M™. The operator norm of the sum of all such products for a k"-order partial 
derivative is therefore < N*M". Taking into account the coefficient 1/(N!) for 
the original A’, we see that the operator norm of terms N; through N> of the 
term-by-term k-times differentiated series is 


No N‘ MN 
< a 
Non, WN! 
We see as a consequence that the term-by-term k-times differentiated series 
obtained from >> (N!)~'A% is uniformly convergent entry by entry. By the 
complex-valued version of Theorem 1.23, applied recursively to handle k'"-order 
partial derivatives, we conclude that exp A is of class C* and that the partial 
derivatives can be computed term by term. Since k is arbitrary, the proof is 
complete. 


Proposition 3.13. The exponential function for matrices satisfies 

(a) eXe’ = eX*” if X and Y commute, 

(b) e* is nonsingular, 

g 

(c) G(e*) = Xe, 

(d) eV ‘XW — w-leX W if W is nonsingular, 

(e) dete¥ = e''*, where the trace Tr X is the sum of the diagonal entries 

of X. 


REMARKS. The conclusion of (a) fails for general X and Y, as one sees by 


taking X = ( and Y = é ) Relevant properties of the determinant 


function det that appears in the statement of (e) are summarized in Section A7 of 
the appendix. 


PROOF. The rate of convergence determined in Proposition 3.12 is good enough 
to justify the manipulations that follow. For (a), we have 
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| 
Me 
leak: 
=| 5. 
Ble 
| 
as 
| 
Me 
ae 
Me ® 
Pa 
> 
nn 
ae 
mel 
t 


N=0 k=0 N=0 k=0 
= 1 
=e + VN = et 
N=0° °° 


Conclusion (b) follows by taking Y = —X in (a) and using e° = 1. For (c), we 
have 


Doe Bee aj ee = 
—(e eS egy a Se 2 vagy 
nc ) ra ! alm | 
CO N CO 
Sy ek Se (tX)N-! = Xe* 
NI!  (N-D! 


Conclusion (d) follows from the computation 
i oe) 1 CO 1 
w-lxw -1 N -lyN -1,X 
e = XW) re X*W=W e’W. 


For conclusion (e), define a complex-valued function f of one variable by 
f (t) = dete’. By (a), we have 


d d d 
f'O) = F dete") ,_g = F det(e"*e™)|,_9 = Fete" )(dete™")|,_o 


d d 
= (det a) ae dete"*)| = FOF (dete**)| _,. 
Now e&* = 1+5X + $5°X? +... = 1+ 5X +s?F(s) for some smooth 
matrix-valued function F with entries F;;. If X has entries X;;, then 


1+sX1,4+s?Fii(s) sX 12 + s?F\2(s) 
det e’* = det sXo) + s*Foi(s) 1L+sXx +s? Fx(s) 


=14+sTrX +s’G(s) 


for some smooth function G. Thus o (dete**)| _, = TrX, and we obtain 
f'(@t) = (Ir X) fo for all t. Consequently 


“(eo F(n) = eA Fe) = (Tr x)e FG) —0 


for all t, and e~“"*”" f(z) is a constant. The constant is seen to be | by putting 
t = 0. Therefore f(t) = e"*”. Conclusion (e) follows by taking t = 1. 
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5. Partitions of Unity 


In Section 10 we shall use a “partition of unity” in proving a change-of-variables 
formula for multiple integrals. As a general matter in analysis, a partition of unity 
serves as a tool for localizing analysis problems to a neighborhood of each point. 
The result we shall use in Section 10 is as follows. 


Proposition 3.14. Let K be a compact subset of IR”, and let {Ui, ..., Ux} be 
a finite open cover of K. Then there exist continuous functions g), ..., g, on R” 
with values in [0, 1] such that 


(a) each g; is 0 outside of some compact set contained in U;, 
(b) ee g; is identically 1 on K. 


REMARKS. The system {),...@,} is an instance of a “partition of unity.” 
For a general metric space X, a partition of unity is a family ® of continuous 
functions from X into [0, 1] with sum identically 1 such that for each point x in 
X, there is a neighborhood of x where only finitely many of the functions are 
not identically 0. The side condition about neighborhoods ensures that the sum 
S yee 9(X) has only finitely many nonzero terms at each point and that arbitrary 
partial sums are well-defined continuous functions on X. If2/is an open cover of 
X, the partition of unity is said to be subordinate to the cover 2/ if each member 
of ® vanishes outside some member of 2/. Further discussion of partitions of unity 
beyond the present setting appears in the problems at the end of Chapter X. The 
use of partitions of unity involving continuous functions tends to be good enough 
for applications to integration problems, but applications to partial differential 
equations and smooth manifolds are often aided by partitions of unity involving 
smooth functions, rather than just continuous functions.! 


We require a lemma. 


Lemma 3.15. In R” , 


(a) if L is a compact set and U is an open set with L C U, then there exists 
an open set V with vil compact and L CV © VLE: 

(b) if K is a compact set and {U;,..., U,} is a finite open cover of K, then 
there exists an open cover {Vi,..., Vn} of K such that vo is a compact 
subset of U; for each i. 


‘Partitions of unity involving smooth functions play no role in the present volume, but they occur 
in several places in the companion volume Advanced Real Analysis, and their existence is addressed 
there. 
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PROOF. In (a), if L = @, we can take V = @. If L # ©, then the continuous 
function x ++ D(x, U%) on R is everywhere positive on L since L C U. 
Corollary 2.39 and the compactness of L show that this function attains a positive 
minimum c on L. If R is chosen large enough so that L C B(R; 0) and if we 
take V = {x € U | D(x, U*) > 5c} M B(R; 0), then L C V, V“ is compact 
(being closed and bounded), and vic {x e RN | D(x, U) = 5c} CU. 

For (b), since {U,,..., U,} isacoverof K, we have K —(U2U---UU,) C U,. 
Part (a) produces an open set V; with in compact such that 


K —(U.U+)-UU,) CU CVI CU). 


The first inclusion shows that {V,, Uz, ..., U,}is an open cover of K . Proceeding 
inductively, let V; be an open set with 


K —(V,U---UVj-y UU 41 Us UU) C Vj SV CU. 


At each stage, {Vi,..., Vi, Uizi,..., Un} is an open cover of K, and yo C Uj. 
Thus {V,,..., V,} is an open cover of K, and vi" C U; for alli. 


PROOF OF PROPOSITION 3.14. Apply Lemma 3.15b to produce an open cover 
{W,,..., We} of K such that wy! is compact and wy! C U; for each i. Then 
apply it a second time to produce an open cover {Vj,..., Vi} of K such that V," 
is compact and V;! C W; for each i. Proposition 2.30e produces a continuous 
function g; > 0 that is 1 on vi" and is 0 off W;. Then g = )~;_, gi is continuous 
and > 0 on R” and is > 0 everywhere on K . A second application of Proposition 
2.30e produces a continuous function h > 0 that is 1 on the set where g is 0 
and is 0 on K. Then g +h is everywhere positive on R”, and the functions 
9; = gi/(g +h) have the required properties. 


6. Inverse and Implicit Function Theorems 


The Inverse Function Theorem and the Implicit Function Theorem are results for 
working with coordinate systems and for defining functions by means of solving 
equations. Let us use the latter application as a device for getting at the statements 
of both the theorems. 

In the one-variable situation we are given some equation, such as x7 + y? = 
a’, and we are to think of solving for y in terms of x, choosing one of the possible 
y’s for each x. For example, one solution is y = —Va? — x*, —a < x <a; 
unless some requirement like continuity is imposed, there are infinitely many 
such solutions. In one-variable calculus the terminology is that this solution is 
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“defined implicitly” by the given equation. In terms of functions, the functions 
F(x, y) =x? + y* —a? and y = f(x) = —Va? — x? are such that F(x, f(x)) 
is identically 0. It is then possible to compute dy/dx for this solution in two 
ways. Only one of these methods remains within the subject of one-variable 
calculus, namely to compute the “total differential” of x” + y* — a”, however that 
is defined, and to set the result equal to 0. One obtains 2x dx + 2ydy = 0 with 
x and y playing symmetric roles. The declaration that x is to be an independent 
variable and y is to be dependent means that we solve for dy/dx, obtaining 
dy/dx = —x/y. The other way is more transparent conceptually but makes 
use of multivariable calculus: it uses the chain rule in two-variable calculus to 
compute d/dx of F(x, f (x)) as the derivative of a composition, the result being 
set equal to 0 because (d/dx)F (x, f (x)) is the derivative of the 0 function. This 


second method gives ae + ne f'(x) = 0, with the partial derivatives evaluated 
x y 


where (x, y) = (x, f(x)). Then we can solve for f’(x) provided aF /ay is not 
zero at a point of interest, again obtaining f’(x) = —x/y. Itis an essential feature 
of both methods that the answer involves both x and y; the reason is that there 
is more than one choice of y for some x’s, and thus specifying x alone does not 
determine all possibilities for f’(x). 

In the general situation we have m equations in n + m variables. Some n of 
the variables are regarded as independent, and we think in terms of solving for 
the other m. An example is 


Ox+t wy? +2xy =0, 
xyzw—1=0, 


with x and y regarded as the independent variables. 
The classical method of implicit differentiation, which is a version of the first 
method above, is again to form “total differentials” 


Qwy? dw + 3z7x dz + (2? + 2y) dx + Bw*y? + 2x) dy =0, 
xyzdwt+xywdz+ yzwdx +xzwdy =0, 


and then to solve the resulting system of equations for dw and dz in terms of dx 
and dy. The system is 


2Qwy? 3z7x dw\ | —(z3 + 2y) dx — (3w*y? + 2x) dy 
xyz xyw day —(yzw) dx — (xzw) dy : 


and the solution is of the form 


dw = coefficient dx + coefficient dy, 


dz = coefficient dx + coefficient dy. 
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Here the coefficients are the various partial derivatives of interest. Specifically 


dw dw 
dw = —d. — dy, 
WwW 9 FT y 
az Oz 
dz= d —d 
Zz 5 sane y 


The analog of the second method above is to set up matters as a computation 
of the derivative of a composition. Namely, we write 


3 2,,3 
COR) (6) 
xyzw—1 Zz y 


5] 
N S&S & 


We apply the chain rule and compute Jacobian matrices of derivatives, keeping 
the variables in the same order x, y, w, z. The Jacobian matrix of the 0 function is 
a0 matrix of the appropriate size, and the other side of the differentiated equation 
is the product of two matrices. Thus 


1 O 
0 0 2+2y 3w*y?+2x wy? 32z7x 0 1 
om dw dw 
0 0 yzZw XxZW xyZ xyw dx Oy 
Oz 9z 
ax oy 

= 2+2y 3wy? + 2x m2 Qwy? 3z7x ow ~ 

yzw XZW xyz xyw az a ; 


In other words, 


Qwy? 327x gu oa rs 2+2y 3w*y? + 2x 
xyz xyw az oz PT yzw xZw : 
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and we have the same system of linear equations as before. Comparing the two 
methods, we see that we have computed the same things in both methods, merely 
giving them different names; thus the two methods will lead to the same result in 
general, not merely in this one example. 
The theoretical question is whether the given system of equations, which was 
F(x, y,w,z) = 0 above, can in principle be solved to give a differentiable 
. w x 
function; the latter was (.) =f ( 5 ) 


_ ) above. The two computational methods 
show what the partial derivatives are if the equations can be solved, but these 
methods by themselves give no information about the theoretical question. The 
theoretical question is answered by the Implicit Function Theorem, which says 


that there is no problem if the coefficient matrix of our system of linear equations, 
Qwy? 32?x 


) in the above example, is invertible at a point of interest. 
xyz xXyw 


namely ( 
Theorem 3.16 (Implicit Function Theorem). Suppose that F is a C! function 

from an open set E in R"*” into R” and that F(a, b) = 0 for some (a, b) in 

E,, with a understood to be in R” and b understood to be in R”. If the matrix 

OF; 

EF ‘| x is invertible, then there exist open sets U C R"*” and W C R" 
Yj x=a, y= 

with (a, b) in U and a in W with this property: to each x in W corresponds a 

unique y in R” such that (x, y) isin U and F(x, y) = 0. If this y is defined as 

f(x), then f isa C! function from W into R” such that f(a) = b, the expression 

F(x, f (x)) is identically 0 for x in W, and 


ro=[S) EI] at (x,y) = (x, f(x). 


We shall come to the proof shortly. In the example above, f’(x) is the matrix 


oF a aFi |i. ( 2wy?  32?x dF; |: (o+2y 3wy?+2x 
az az >| Oy, 1S , and ax, 1S : ‘ 
ax by Jj xXyZ  xyw j yzw XxZw 


Let us use the same approach to the question of introducing a new coordinate 
system in place of an old one. For example, we start with ordinary Euclidean 
coordinates (uv, v) for R?, and we want to know whether polar coordinates (r, 0) 
define a legitimate coordinate system in their place. The formula for passing from 


2 


one system to the other is (“) = ce 
r and 6. Defining r and 6 entails solving for r and @ in terms of u and v. Thus 


let us set up the system 


) , but this formula does not really define 


rcosé—u=0, 
rsind—v=0. 


This is a system of the kind in the Implicit Function Theorem, and the con- 
siderations in that theorem apply. The independent vector variable is to be 
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x= aE and the dependent vector variable is to be y = Gs The system 
itself is F(u, v,r, 0) = 0, where 


_ (Fituv,r,0)\ _ (rcosé —u 
cera a) 


The sufficient condition for solving the equations locally is that the matrix Ea 
J 
be invertible at a point of interest. This is just the matrix 


cos@ —r sind 
( sin 6 r cos 6 ) : 
The determinant is 7, and hence the matrix is invertible except where r = 0. The 
Implicit Function Theorem is therefore telling us in this special case that r and 0 
give us good local coordinates for R* except possibly where r = 0. The Implicit 
Function Theorem gives no information about what happens when r = 0. 


The general result about introducing a new coordinate system in place of an 
old one is as follows. 


Theorem 3.17 (Inverse Function Theorem). Suppose that g is a C! function 
from an open set E of R” into R”, and suppose that g’(a) is invertible for some 
ain E. Put b = g(a). Then 


(a) there exist open sets U C E C R” and V C R” such that a is in U,b is 
in V,@ is one-one from U onto V, and 
(b) the inverse f : V — U is of class C!. 


Consequently, f’(g(x)) = g’(x)7! for x in U. 


REMARKS. Theorems 3.16 and 3.17 are closely related. We saw in the con- 
text of polar coordinates that the Implicit Function Theorem implies the Inverse 
Function Theorem, and Problem 6 at the end of the chapter points out that this 
implication is valid in complete generality. Actually, the implication goes both 
ways, and within this section we shall follow the more standard approach of 
deriving the Implicit Function Theorem from the Inverse Function Theorem and 
subsequently proving the Inverse Function Theorem on its own. 


PROOF OF THEOREM 3.16 IF THEOREM 3.17 IS KNOWN. Let n,m, E, F, and 
(a, b) be given as in the statement of Theorem 3.16. We define a function to 


which we shall apply Theorem 3.17 in dimension n + m. The function is 


g(x,y) =(%, F(x, y)) for (x, y) in E. 
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This satisfies g(a, b) = (a, F(a, b)) = (a, 0), and its Jacobian matrix at (a, b) 
is 


1 0 
0 

y'(a,b) =| ° : 
Fell. Cs] 

se al a a 
; OF; a ; , , 
Since Theorem 3.16 has assumed that | | is invertible, y’(a, b) is 
Oy; x=a, y=b 


invertible. Theorem 3.17 therefore applies to g and produces an open neighbor- 
hood W’ of g(a, b) = (a, 0) such that y~! exists on W’ and carries W’ to an open 
set. Let U = y~!(W’). Define W to be the open neighborhood W’/N (IR” x {0}) 
of a in R", and define f(x) for x in W by (x, f(x)) = gy '(x,0). Then f is 
of class C! on W, and f(a) = b because (a, f(a)) = y'(a,0) = (a, b). The 
identity 


(x,0) = oy '(x,0)) = 9, f(@)) = @, F(a, f))) 


shows that F(x, f(x)) = 0 for x in W. The latter equation and the chain rule 
(Theorem 3.10) give the formula for f’(x). 

Finally we are to see that y = f (x) is the unique y in R” for which (x, y) is in 
U and F(x, y) = 0. Thus suppose that x is in W and that y; and y2 arein R” with 
(x, y) and (x, y2) inU and F(x, y;) = F(x, y2) = 0. Then we have g(x, y;) = 
(x, F(x, y1)) = (&, 0) = (x, F(x, y2)) = GC, y2). Since (x, y1) and (x, yp) are 
in U, we can apply gy | to this equation and obtain (x, y;) = (x, y2). Therefore 
y| = yz. This completes the proof of Theorem 3.16 if Theorem 3.17 is known. 


Let us turn our attention to a direct proof of the Inverse Function Theorem 
(Theorem 3.17). When the dimension n is 1, a nonzero derivative at a point 
yields monotonicity, and the theorem is greatly simplified; this special case is the 
subject of Section A3 of the appendix. 

For general dimension n, it may be helpful to begin with an outline of the 
proof. The first step is to show that g is one-one near the point a in ques- 
tion; this is relatively easy. The hard step is to prove that @ is locally onto 
some open set; this uses either the compactness of closed balls or else their 
completeness, and we return to a discussion of this step in a moment. The 
argument for differentiability of the inverse function depends on the continuity 
of the inverse function; this dependence was already true in the 1-dimensional 
case in Section A3 of the appendix. Continuity of the inverse function amounts 
to the fact that small open neighborhoods of a get carried to open sets, and this is 
part of the proof that ¢ is locally onto some open set. Finally the chain rule gives 
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(g(x) = (¢'(@"x))) and the continuity of (g~!)’ follows. Thus g~! is 
of class C!, 

In carrying out the hard step, one has a choice of using either the compactness 
of closed balls or else their completeness. The argument using completeness 
lends itself to certain infinite-dimensional generalizations that are well beyond 
the scope of this book. Since the argument using compactness is the easier one, 
we shall use that. 

The first step and the hard step mentioned above will be carried out in three 
lemmas below. After them we address the continuity and differentiability of the 
inverse function, and the proof of the Inverse Function Theorem will be complete. 


Lemma 3.18. If L : R” — R" is a linear function that is invertible, then there 
exists a real number m > 0 such that |L(y)| > m|y| for all y in R”. 


REMARK. We shall apply this lemma in Lemma 3.19 with L = g’(a). 
PROOF. The linear inverse function L—! : R” — R” is one-one and onto. Thus 


if y is given, there exists x with y = L~!(x), and we have |y| = |L~!(x)| < 
LZ" IIx] < L714 I||L0)]. The lemma follows with m = ||L~!||7!. 


Lemma 3.19. In the notation of Theorem 3.17 and Lemma 3.18, choose 
m > 0 such that |g’(a)(y)| = m|y| for all y € R”, and choose, by continuity 
of gy’, any 6 > 0 for which x € B(é; a) implies ||g’(x) — ¢g’(a)|| < in Then 
lp(x') — v(x)| = oh |x’ — x| whenever x’ and x are both in B(6; a). 

REMARKS. This proves immediately that g is one-one on B(6; a), and it gives 
an estimate that will establish that g~! is continuous, once y~! is known to exist. 
It proves also that the linear function g’(x) is invertible for x € B(é; a) because 


mly| < |g'(@Q)| 
<l9' WW) +19 @)O) — Y'@OV)| 
<l¢'@)(y) + Ile’) — @llly! 
_miyl, 
Dn 
if y’(x) were not invertible, then any nonzero y in the kernel of g’(x) would 
contradict this chain of inequalities. 


<l¢'(x)(y)I 4 


PROOF. The line segment from x to x’ lies within B(5; a). Put z = x’ — x, 
write this line segment as ¢t +> x + tz for 0 < t < 1, and apply the Mean Value 
Theorem to each component ¢, of g to obtain 


K(X") — K(X) = OK (x + tz)|,_, — K(X + t2)|,_, 
= pl (x + tez)(z) «ex with O < % <1 
= y'(a)(z) ex + (G'(& + hz) — G'(a))(Z) «ex. 
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Taking the absolute value of both sides allows us to write 


lp(’) — o(x)| = lee") — Gx) 


> |p" (a)(z) + el — (p(x + ez) — 9'(a))2)| 
m 


> Ie! . free aan 
> |g (a)(Z) - ex | 2a |x’ — x| 
Therefore 
£ 1 / m / 
lp(x’) — g(x)| = ar ly (a)(z)| — 2Ja |x — x| 
Se Sea a 
XxX XxX XxX XxX 
~ Jn 2/n 
m , | 
=> — xX -xXx 
2/n 


Lemma 3.20. With notation as in Lemma 3.19, g(B(6; a)) is open in R”. 


PROOF. Let c = m if (2./n ) be the constant in the statement of Lemma 3.19. 
Fix xp in B(6; a) and let yo = ¢(Xo), So that yo is the most general element of 
v(B(6; a)). Find 5; > 0 such that B(5,; xo)! C B(6; a). It is enough to prove 
that B(cd1/2; yo) © g(B(6; a)). Even better, we prove that B(cd;/2; yo) C 
y(B(51; x0)"!). 

Thus let y; have |y; — yo| < cé1/2. Choose, by compactness of B(61; xo)", 
a member x = x; of B(6,; x0)! for which |p(x) — yi |? is minimized. Let us 
show that x; is not on the edge of B(51; xo)", i.e., that |x; — xo| < 6,. In fact, if 
x, — Xo| = 5,, then Lemma 3.19 gives 


lp(x1) — yil = le) — yol — ly — yol 
> |9(%1) — G(Xo)| — €5)/2 
> c|x1 — xo] — cd) /2 
= c6,/2 
> |yi — yol 
= |¢(%0) — yil, 


in contradiction to the fact that |g(x) — y;|* is minimized on B(6,; xo)! atx = x). 
Thus |x; —xo| < 6,. Inthis case the scalar-valued function (y(x)—y )-(g(x)—y1) 
is minimized at an interior point of B(6,; xo)‘, and all its partial derivatives must 
be 0. Therefore g'(x1)(z) « (g(x1) — y1) = O for all z in IR”. Since the linear 
function g’(x;) is onto IR”, we conclude that g(x,;) — y; = O, and the lemma 
follows. 
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COMPLETION OF PROOF OF THEOREM 3.17. Lemma 3.19 showed that the 
restriction of g to B(6; a) is one-one, and Lemma 3.20 showed that the image is 
an open set in R”. Let f : p(B(6; a)) — B(6; a) be the inverse function. To 
complete the proof of Theorem 3.17, we need to see that f is differentiable on 
y(B(6; a)). Fix x in B(6; a), and suppose that x + h is in B(6; a) with h # 0. 
Define y and k by y = g(x) and y+k = g(x +h). Since @ is one-one on 
B(6; a), k is not 0. in fact, Lemma 3.19 gives 


|k| = clhl, (*) 
where c =m ji (2,/n ). The definitions give 
fOtH—-fO)-¢@'WM=@4+h-x-9@)'H& 


=h— g(x) '(y@ +h) — g(x)) 
= —g' (x)! (pe +h) — g(x) — g'(x)(h)). 


Combining this identity with (*) gives 


Ifo +h — fo) —¢'@)'®) Zs lo’) "Il le@ +h) — o@) — g'@)(h)| 
Ik| - c |A| 


If € > 0 is given, choose 7 > 0 small enough so that 


lo’) I le +A) — G(x) — g'(X)AI 
<e€ 
c |h| 


as long as |h| < n. If |k| < cn, then |h| < 7 by (*) and hence 


Ifo +k) — £0) —¢'@)!&)| 
ik] <€ 


In other words, f is differentiable at y, and f’(y) = g/(x)7!. 


Suppose that the given function g in the Inverse Function Theorem is better 
than a C! function. What can be said about the inverse function? The answer 
is carried by the formula f’(y(x)) = ’(x)7! for the derivative of the inverse 
function f. This formula implies that the partial derivatives of f are quotients 
of polynomials in partial derivatives of g by a nonvanishing polynomial (the 
determinant) in partial derivatives of g. Thus the iterated partial derivatives of f 
can be computed harmlessly in terms of the iterated partial derivatives of and 
this same determinant polynomial. Consequently if g is of class Ck with k > 1, 
then so is f. If g is smooth, so is f. In the case that g and f are both smooth, 
we say that is a diffeomorphism. Let us summarize these facts in a corollary. 
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Corollary 3.21. Suppose, for some k > 1, that yg is a C* function from an 
open set E of R” into R”, and suppose that g’(a) is invertible for some a in E. 
Put b = g(a). Let U and V be open subsets of IR” as in the Inverse Function 
Theorem such that a is in U, b is in V, and g is one-one from U onto V. Then 
the inverse function f : V — U is of class C*. If g is smooth, then ¢ is a 
diffeomorphism of U onto V. 


7. Definition and Properties of Riemann Integral 


Section I.4 contained a careful but limited development of the Riemann integral in 
one variable. The present section extends that development to several variables. 
A certain amount of the theory parallels what happened in one variable, and proofs 
for that part of the theory can be obtained by adjusting the notation and words of 
Section I.4 in simple ways. Results of that kind are much of the subject matter 
of this section. 

In later sections we shall take up results having no close analog in Section I.4. 
The main results of this kind are 


(i) a necessary and sufficient condition for a function to be Riemann inte- 
grable, 
(ii) Fubini’s Theorem, concerning the relationship between multiple integrals 
and iterated integrals in the various possible orders, 
(iii) a change-of-variables formula for multiple integrals. 


We begin a discussion of these in the next section. 

The one-variable theory worked with a bounded function f : [a,b] > R, with 
domain a closed bounded interval, and we now work with a bounded function 
f : A — Rwith domain A a “closed rectangle” in R”. For this purpose a closed 
rectangle (or “closed geometric rectangle’’) in R” is a bounded set of the form 


A= [a,,b\] x +++ xX [an, bal 


with a; < b; for all j. Let us abbreviate [a;, bj] as Aj. In geometric terms 
the sides or faces are assumed parallel to the axes or coordinate hyperplanes. 
We shall use the notion of open rectangle in later sections and chapters, an open 
rectangle being a similar product of bounded open intervals (a;, bj) for! < j <n. 
However, in this section the term “rectangle” will always mean closed rectangle. 

If P; isa one-variable partition of A;, then we can form an n-variable partition 
P=(P,,..., P,) of the given rectangle A into component rectangles [c,, d;] x 
+++ [Cn, dn], where c; and d; are consecutive subdivision points of P;. A typical 
component rectangle is denoted by R,, and its n-dimensional volume Thj- ,(dj—c;) 
is denoted by |R|. The mesh yz(P) of the partition P is the maximum of the 
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meshes of the one-dimensional partitions P;, hence the largest length of a side of 
all component rectangles of P. 

Relative to our given function f and a given partition P, define Mr(f) = 
SUP, cr f(x) and mr(f) = infyer f(x) for each component rectangle R of P. 
Put 


U(P, f)= > Mr(f) |R| = upper Riemann sum for P, 
R 


L(P,f)= So me(f) |R| = lower Riemann sum for P, 
R 


/ fdx= inf U(P, f) = upper Riemann integral of f, 
A 


/ f dx = sup L(P, f) = lower Riemann integral of /f. 
YA . 


We say that f is Riemann integrable on A if in for = A f dx, and in this 
case we write [ 4 Jf ax for the common value of these two numbers. We write 
(A) for the set of Riemann integrable functions on A. The following lemma is 
proved in the same way as Lemma 1.24. 


Lemma 3.22. Suppose that f : A— Rhasm < f(x) < M forall x in A. 
Then for any partition P of A, 


m|A| < L(P, f) <U(P, f) < MIA, 


m|Al < | fdx < MIAl, 
+A 


m|A| < [fa < MIA|. 
A 


A refinement of a partition P of A is a partition P* such that every component 
rectangle for P* is a subset of a component rectangle for P. If P = (Pi,..., Pn) 
and P’ = (P},..., P’) are two partitions of A, then P and P’ have at least one 
common refinement P* = (Py,..., P*); specifically, for each j, we can take 
P* to be a common refinement of P; and P’. Arguing as in Lemma 1.25 and 
Theorem 1.26, we obtain the following two results. The key to the second one of 
these is the uniform continuity of any continuous function f : A — R; for the 
uniform continuity we appeal to the Heine—Borel Theorem (Corollary 2.37) and 
Proposition 2.41 in several variables, the corresponding one-variable result being 
Theorem 1.10. 


7. Definition and Properties of Riemann Integral 163 


Lemma 3.23. Let f : A > R satisfy m < f(x) < M for all x in A. Then 


(a) L(P, f) < L(P*, f) and U(P*, f) < UCP, f) whenever P is a parti- 
tion of A and P* is a refinement, 
(b) L(Pi, f) < ou (Po, f) whenever P; and P> are partitions of A, 


(©) f fax <f,fax, 


(d) [yfdx—f fdx <(M—m)IAl, 
(e) the function f is Riemann integrable on A if and only if for each € > 0, 
there exists a partition P of A with U(P, f) — L(P, f) <. 


Theorem 3.24. If f : A — R is continuous on A, then f is Riemann 
integrable on A. 


Next we argue as in Proposition 1.30 and Theorem 1.31 to obtain two more 
generalizations to several variables. The several-variable version of uniform 
continuity is needed in the proof of Proposition 3.25d. 


Proposition 3.25. If f; and f2 are Riemann integrable on A, then 

(a) fi + fo isin R(A) and fi, (fit fodx = f, fidx + f, frdx, 

(b) cf; isin R(A) and [, cf; dx =c J, fi dx for any real number c, 

(c) fi < foon A implies [, fidx < J, fodx, 

(d) m < f; < MonA and g: [m, M] — Rcontinuous imply that ¢g o f; is 

in R(A), 

(e) | fil isin R(A), and | f, fidx| < fy |fildx, 

(f) fp and fi fz are in R(A), 

(g) /fi isin R(A) if fi > Oon A. 


Theorem 3.26. If {/,} is a sequence of Riemann integrable functions on A 
and if { f,} converges uniformly to f on A, then f is Riemann integrable on A, 
and lim, [, fndx = J, f dx. 


There is also a several-variable version of Theorem 1.35, which says that Rie- 
mann integrability can be detected by convergence of Riemann sums as the mesh 
of the partition gets small. Relative to our standard partition P = (P,,..., Py), 
select a member tr of each component rectangle R relative to P,, and define 


S(P. {tr}. f) = >> f(te)IRI. 


R 


This is called a Riemann sum of /. 
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Theorem 3.27. If f is Riemann integrable on A, then 


lim SP (tah = f fax. 


Conversely if f is bounded on A and if there exists a real number r such that for 
any € > 0, there exists some 6 > 0 for which |S(P, {tr}, f) —r| < € whenever 
U(P) <6, then f is Riemann integrable on A. 


REMARK. The proof of the direct part is more subtle in the several-variable 
case than in the one-variable case, and we therefore include it. The proof of the 
converse part closely imitates the proof of the converse part of Theorem 1.35, and 
we omit that. 


PROOF. For the direct part the function f is assumed bounded; suppose 
| f(x)| < M on A. Let € > 0 be given. Choose a partition P* = (Py, ..., P*) 
of A with U(P*, f) < if f dx +. Fix an integer k such that the number of 
component intervals of Pe is<kforl < j <n. Put 


€ 
MK YY Mig; |All 
and suppose that P = (P,,..., P,) is any partition of A = A; x --- x A, with 
(P) < 6. For each j with 1 < j <n, we separate the component intervals 
of P; into two kinds, the ones in F) being the component intervals of P; that 
do not lie completely within a single component interval of P;* and the ones in 


51 


G being the rest. Similarly we separate the component rectangles of P into two 
kinds, the ones in F being the component rectangles that do not lie completely 
within a single component rectangle of P* and the ones in G being the rest. 

If R = R, x--- X R, is a member of F, then R; is in FD) for some j with 
1 < j <n; let j = j(R) be the first such index. Let F; be the subset of R’s in 
F with j(R) = j,so that F = Uj=1 F; disjointly. Then we have 


n 


U(P, f)= >) >> Mr(f)IRI+ D> Mr(f) IRI. (x) 


J=1 REF; REG 


For the first term on the right side, 


[oO meniall< > il 


j=l ReFj J=1 ReF; 


n 
=MY> Yo IRI x ++ x [Ral 


J=1 RX x REF; 


n 
<M) DO IRM MMy;lAil- 


Ja! Rie F 


7. Definition and Properties of Riemann Integral 165 


Each member R; of F) contains some point of the partition P* in its interior, 
and two distinct Re s cannot contain the same point. Thus the number of Rj’s in 
F is <k. Also, |R;| < w(P). Consequently we have 


> > Mr(f) \Rl| < MnP) Ts |Ail < Mkd, 7, Tz; |Ail < €- 


J=1 REF; 

The contribution to U(P, f) of the second term on the right side of (+) is 
S> Mr(fIRI= >> >> Me(f)IRI S DoMeGIR: [SUCt F). 
RG R* RCR* 

Thus 

UP. fysetuPr fis | fdx+2« 
A 


Similarly we can define 52 such that w(P) < 4 implies 


L(P, fo = i f dx — 2. 
A 
If 6 = min{d,, 62} and w(P) < 4, then 


[ fax-2 s LP, Ne SP tah NUP. As | fax426 
A A 


for any choice of points tr, and hence |S(P, {tr}, f>) — ie f dx| < 2e. This 
completes the proof of the direct part of the theorem. 


Finally we include one simple interchange-of-limits result that is handy in 
working with integrals involving derivatives. 


Proposition 3.28. Let f be a complex-valued C! function defined on an open 
set U in R”, and let K be a compact subset of U. naa 


(a) the convergence of aL f(x + he;) — f(x)] to ae F(x), as h tends to 0, is 
uniform for x in K, 

(b) the function g(%2,...,%,) = Ais f (x1, ..-,%n) dx, is of class C! on 
the set of all points y = (x2,...,Xn) for which [a,b] x {y} lies in 
U, and aa ah f(x)dx,; = ie of ve for j 4 1 as long as the set 


a ax; 


[a, b] x (Co, ...,Xn)} lies in U. 


PROOF. In (a), we may assume that f is real-valued. The Mean Value Theorem 
gives 
1 of of of 
{ he; = L te; 
h [fa+he)—f@)] ae (x) ae (x + tej) Dx) (x) 


for some t between 0 and h, and then (a) follows from the uniform continuity of 
of i dx; on K. Conclusion (b) follows by combining (a) and Theorem 1.31. 
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As we did in the one-variable case in Sections 3 and I.5, we can extend our 
results concerning integration in several variables to functions with values in R” 
or C” = R*”, Integration of a vector-valued function is defined entry by entry, 
and then all the results from Theorem 3.24 through Proposition 3.28 extend. The 
one thing that needs separate proof is the inequality | padi dx| < J, \fildx of 
Proposition 3.25e, and a proof can be carried out in the same way as at the end 
of Section 3 in the one-variable case. 


8. Riemann Integrable Functions 


Let E be a subset of R”. We say that E is of measure 0 if for any e > 0, E can 
be covered by a finite or countably infinite set of closed rectangles in the sense 
of Section 7 of total volume less than €. It is equivalent to require that E can be 
covered by a finite or countably infinite set of open rectangles of total volume 
less than €. In fact, if a system of open rectangles covers E, then the system of 
closures covers E and has the same total volume; conversely if a system of closed 
rectangles covers E, then the system of open rectangles with the same centers 
and with sides expanded by a factor 1 + 6 covers E as long as 6 > 0. 

Several properties of sets of measure 0 are evident: a set consisting of one 
point is of measure 0, a face of a closed rectangle is a set of measure 0, and any 
subset of a set of measure 0 is of measure 0. Less evident is the fact that the 
countable union of sets of measure 0 is of measure 0. In fact, if € > 0 is given 
and if £;, Eo,... are sets of measure 0, find finite or countably infinite systems 
Rk; of closed rectangles for j > 1 such that the total volume of the members of 
Rj is < €/2". Then R = Lj; R; is a system of closed rectangles covering L); Ej 
and having total volume < €. 

The goal of this section is to prove the following theorem, which gives a 
useful necessary and sufficient condition for a function of several variables to be 
Riemann integrable. The theorem immediately extends from the scalar-valued 
case as stated to the case that f has values in R” or C”. 


Theorem 3.29. Let A be a finite closed rectangle in R” of positive volume, 
and let f : A > R be a bounded function. Then f is Riemann integrable if and 
only if the set 

B= {x | f is not continuous at a} 


has measure 0. 
Theorem 3.29 supplies the reassurance that a finite closed rectangle of positive 


volume cannot have measure 0. In fact, the function f on A that is | at every point 
with all coordinates rational and is 0 elsewhere is discontinuous everywhere on 
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A. By inspection every U(P, f) is |A| for this f,and every L(P, f) is 0; thus f 
is not Riemann integrable. The theorem then implies that A is not of measure 0. 

The proof of the theorem will make use of an auxiliary notion, that of “con- 
tent 0,” in order to simplify the process of checking whether a given compact 
set has measure 0. A subset FE of IR” has content 0 if for any € > 0, EF can 
be covered by a finite set of closed rectangles in the sense of Section 7 of total 
volume less than €. It is equivalent to require that E can be covered by a finite 
set of open rectangles of total volume less than €. A set consisting of one point 
is of content 0, a face of a closed rectangle is a set of content 0, any subset of a 
set of content 0 is of content 0, and the union of finitely many sets of content 0 is 
of content 0. 

Every set of content 0 is certainly of measure 0, but the question of any converse 
relationship is more subtle. Consider the set E of rationals in [0, 1] as a subset 
of R!. Since this set is a countable union of one-point sets, it has measure 0. 
However, it does not have content 0. In fact, if we were to have E C (is [aj, bj] 
with yy (b; —a;) < €,then we would have E‘! C te [a;, bj] by Proposition 
2.10, since GS [a;, bj ]isclosed. Then E“! would have content 0 and necessarily 
measure 0. This contradicts the fact observed after the statement of the theorem — 
that a closed rectangle of positive volume, such as E cl — [0, 1] in R!, cannot have 
measure 0. We conclude that a bounded set of measure 0 need not have content 0. 


Lemma 3.30. If E is a compact subset of R” of measure 0, then EF is of 
content 0. 


PROOF. Let E be of measure 0, and let € > 0 be given. Choose open rectangles 
Ej with E C U2, Ej and 7%, |Ej| < €. By compactness, E © iL, E; for 
some NV. Then se |E;| < €. Since € is arbitrary, E has content 0. 


Recall from Section II.9 the notion of the oscillation of a function. For a 
function f : A > R, the oscillation at xo is given by 


oscy(xo) =lim sup | f(x) — f@o)|. 
540 xEB(d;x0) 


The oscillation is 0 at xo if and only if f is continuous there. Lemma 2.55 tells 
us that 


{x €U | osce(x) > 2e}" {x €U | oscg(x) > §} 


for any € > 0. 


Lemma 3.31. Let A be a nontrivial closed rectangle in R”, and let f : A— R 
be a bounded function with osc¢(x) < € for all x in A. Then there is a partition 
P of A with U(P, f) — L(P, f) < 2€|Al. 
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PROOF. For each x in A, there is an open rectangle U,, centered at x9 such 
that | f(x) — f (xo)| < € on ANUS. Then Muyo (Sf) —Muye (f) < 2€. These open 
rectangles cover A. By compactness a finite number of them suffice to cover A. 
Write A C U,, U---UU,,, accordingly. Let P be the partition of A generated by 
the endpoints in each coordinate of A and the endpoints of the closed rectangles 
U oe we discard endpoints that lie outside A. Each component rectangle R of P 
then lies completely within some U San and we have Mr(f) — mr(f) < 2e for 
each component rectangle R of P. Therefore 


U(P, f) — LP, f) = )> (Mr(f) —ma(f)) IRI < 2€ DR] = 2€1 Al. 
R 


R 


PROOF OF THEOREM 3.29. Define Be = {x | osc (x) = €} for each e > 0, 
so that B = (J*°, Bi/n. For the easy direction of the proof, suppose that f is 
Riemann integrable. We show that B,,, has content 0 for all n. Since content 0 
implies measure 0, B, will have measure 0 for all 7. So will the countable union, 
and therefore B will have measure 0. 

Given € > 0 and n, use Lemma 3.23e to choose a partition P of A with 
U(P, f) — L(P, f) < €/n. Let 


R= {component rectangles R of P | R°N Bin F @}, 


where R° is the interior of R. Then Bj /, is covered by the closed rectangles in R 
and the boundaries of the component rectangles of P. The latter are of content 0. 
For R in R, let us see that Me(f) —mpr(f) = 1/n. In fact, if xg isin R°N Bin, 
then osc (xo) = 1/n, so that lim; | SUP|x—xi<6 Lf) — f(%0)| = 1/n and 


sup |f(x)— f(xo)| = 1/n for all 5 > 0. 


|x—x0|<6, 


Therefore Mr(f) — mr(f) = 1/n. Summing on R € R gives 


SIRI < D> (Me(f) —me(f)) IRIS D> (Ma(f) — me(f)) IRI 


RER. RER all R 
= U(P, f)-—L(P, f) s €/n, 


and thus }) pep |R| < €. Consequently B,/, has content 0, as asserted. 

For the converse direction of the proof, suppose that B has measure 0. We 
are to prove that f is Riemann integrable. Let « > 0 be given. The inclusion 
of Lemma 2.55 gives Re © Beja © B, and thus Be has measure 0. The set 
B‘ is compact, and Lemma 3.30 shows that it has content 0. Hence the subset 
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B, has content 0. Choose open rectangles U;,..., Uj, such that Be © Uji U; 
and ae |U;| < €. Form the partition P of A generated by the endpoints in 
each coordinate of A and the endpoints of the closed rectangles U Re we discard 
endpoints that lie outside A. 

Then every component closed rectangle R of P is in one of the two classes 


Ri = {R|R Cc US for some rae 
Ro = {R| RNB. = 9}. 


In fact, our definition is such that ROU; ¢ @ implies R C us. IfROB. 4 2, 
let xo be in RM B.. Then xo is in some U;, and RN U; ¢ @ for that j. Hence R 
is in Ry. 

We shall construct a particular refinement P’ of P ina moment. Let R’ be a 
typical component rectangle of P’. For any refinement P’ of P, we have 


UP, fy=LPS f) 
<0 DS (Ma (f)— me) IRI+ DD DE (Me (P) — me(f)) IR 


RER, R/CR RER> R/CR 


<2(sup|fl) > SOIR I+ DO DS (Me (f) —me(f)) IR 
A RER, R/CR RER> R/CR 
< 2(sup|fl)e + D> So (Me(f) —me(f)) [R' 
a RER» RGR 
since ae |U;| < €. For R in R2, we have osc (x) < € for all x in R. Lemma 
3.31 shows that there is a partition Pr of R such that U(Pr, f) — L(Pr, f) < 
2e|R|. In other words, D> pcr (Ma(f) — me(f))|R'| < 2¢€|R| if P’ is fine 
enough to include all the n-tuples of Pr. If P’ is fine enough so that this happens 
for all R in R, then we obtain 


UCP’, f)— LEP, f) = 2(sup fle + Y= 2€|R| < 2e(sup |f| + |Al). 
RER»2 


and the theorem follows. 


9. Fubini’s Theorem for the Riemann Integral 


Fubini’s Theorem is a result asserting that a double integral is equal to an iterated 
integral in either order. An unfortunate feature of the Riemann integral is that 
when an integrable function f(x, y) is restricted to one of the two variables, 
then the resulting function of that variable need not be integrable. Thus a certain 
amount of checking is often necessary in using the theorem. This feature is 
corrected in the Lebesgue integral, and that, as we shall see in Chapter V, is one 
of the strengths of the Lebesgue integral. 
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Theorem 3.32 (Fubini’s Theorem). Let A C R” and B C R” be closed 
rectangles, and let f : A x B — R be Riemann integrable. For x in A let f,, be 
the function y f(x, y) for y in B, and define 


cx) = f feloddy = f FEN: 
seed 52 “ B 


ux) =f foray = f foray. 
B B 


as functions on A. Then £ and U/ are Riemann integrable on A and 


faxdy =f ctxydx = | [| f(x, ydy] dx, 
AxB A Avs B 


faxdy = f Uxyax = | [| Fees ay] ae. 
AxB A AlJB 


PROOF. Let P be a partition of the form (P4, Pg), and let R = R4 x Rg bea 
typical component rectangle of P. Then 


LP, f) = Yome(fIRI= Yo (Yo mayxte(f)IRal) Ral: 
R Ra Rp 
For x in Ra, mr,xr,(f) < mr, (fx). Hence x in Ry, implies 
DY megxto(f)IRal SD mep(fd Rel sf fedy = £69. 
Rp Rp “ B 
Taking the infimum over x in R4 and summing over Rg gives 
LP. £) = > (Yo magxne(f) Ral) Ral SY me, (L) [Ral = L(Pa, 2). 
Ra Rp Ra 
Similarly 
U(P4,U) < UCP, f). 
Thus 
L(P, f) <= L(Pa, £) < U(Pa, £) < U(Pa,U) < UCP, f). 


Since f is Riemann integrable, the ends of the above display can be made close 
together by choosing P appropriately. The second and third members of the 
display will then be close, and hence 


, faxdy = | Lax = f Lax. 
AxB aay: A 


The result for £ follows. The result for 2/ follows in similar fashion immediately 
from the inequalities 


L(P, f) < L(Pa,£) < L(Pa,U) < U(Pa,U) < UCP, f). 


This proves the theorem. 
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REMARKS. 


(1) Equality of the double integral with the iterated integral in the other order 
is the same theorem. Thus the iterated integrals in the two orders are equal. 
(2) If f is continuous on A x B, then f; is continuous on B as a consequence 


of Corollary 2.27, so that [ pl y)dy = BiG y) dy. Hence 


| favay= ff re. nvay]ax 


when f is continuous on A x B. This result is isolated as Corollary 3.33 below. 
Evidently it immediately extends to continuous functions with values in R* or 
Ce 

(3) In practice one often considers integrals of the form rE y f@, y)dx dy 
for some open set U, where f is continuous on some closed rectangle A x B 
containing U. Then the double integral equals [ ann 2NloG@, ydxdy, 
where Jy is the indicator function? of U equal to 1 on U and 0 off U. In 
many applications the functions (/y), have harmless discontinuities and (f Jy) 
is therefore Riemann integrable as a function of x. In this case, the upper and 
lower integrals can again be dropped in the statement of Theorem 3.32. 


Corollary 3.33 (Fubini’s Theorem for continuous integrand). Let A C R” 
and B C R” be closed rectangles, and let f : A x B — R be continuous. Then 


| fasdy=] Lf tonayjax= [Lf renax)ay. 


10. Change of Variables for the Riemann Integral 


The goal in this section is to prove a several-variables generalization of the one- 
variable formula 


b B 
/ f(x) dx =f f(~o))¢'(y) dy 
a A 


given in Theorem 1.34. In the one-variable case we assumed in effect that g 
was a strictly increasing function of class C! on [a,b] and that f was merely 
Riemann integrable. The several-variables theorem in this section will be only 
a preliminary result, with a final version stated and proved in Chapter VI in the 


Indicator functions are called “characteristic functions” by many authors, but the term “charac- 
teristic function” has another meaning in probability theory and is best avoided as a substitute for 
“indicator function” in any context where probability might play a role. 
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context of the Lebesgue integral. In particular we shall assume in the present 
section that f is continuous and that it vanishes near the boundary of the domain, 
and we shall make strong assumptions about g. To capture succinctly the notion 
that f vanishes near the boundary of its domain, we introduce the notion of the 
support of f, which is the closure of the set where f is nonzero. 


Theorem 3.34 (change-of-variables formula). Let y be a one-one function of 
class C! from an open subset U of R” onto an open subset g(U) of IR” such that 
det y'(x) is nowhere 0. Then 


/ Hoje / F(o(x))| det y'(x)| dx 
gU) U 


for every continuous function f : g@(U) — R whose support is a compact subset 
of g(U). 


Before a discussion of the sense in which this result has to regarded as pre- 
liminary, a few remarks are in order. The function g’ is the usual derivative of 
gy, and g’(x) is therefore a linear function from R” to R” that depends on x. 
The matrix of the linear function g’(x) is the Jacobian matrix [dg;/dx;], and 
det y'(x) is the determinant of this matrix. In classical notation, this determinant 


is often written as sone. and then the effect on the integral of changing 
X1,+-++5Xn 
. . O(V1,---5 Yn) 
variables can be summarized by the formula dy = | ————| dx. The 
0(x4, ee en) 


absolute value signs did not appear in the one-variable formula in Theorem 1.34, 
but the assumption that gy was strictly increasing made them unnecessary, g' (x) 
being > 0. Had we worked with strictly decreasing g, we would have assumed 
gy’ (x) < 0 everywhere, and the limits of integration on one side of the formula 
would have been reversed from their natural order. The minus sign introduced 
by putting the limits of integration in their natural order would have compensated 
for a minus sign introduced in changing g’(x) to |g’(x)|. 

The hypotheses on g make the Inverse Function Theorem (Theorem 3.17) 
applicable at every x in U. Consequently g(U) is automatically open, and ¢ has 
a locally defined C! inverse function about each point v(x) of the image. Since 
gy has been assumed to be one-one, g : U — g(U) has a global inverse function 
yg! of class C!. 

We can use g™! to verify that f og has compact support in U. To the equality 
o({x €U| fe) £0}) = {y € PU) | FO) FO}, we apply gy"! and obtain 
{x €U| f@)) 40} =~" !({y € eU) | Ff) #0}). Hence 


{x €U| f(x) £0} = (go "(Ly € @U)| FO) #0}))". 
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The identity F (E"') C (F(E))* holds whenever F is a continuous function 
between two metric spaces, by Proposition 2.25. When E“ is compact, equality 
actually holds. The reason is that Propositions 2.34 and 2.38 show F(E“) to 
be closed; since F(E“) is a closed set containing F(E), it contains (F(E yy 
Applying this fact to the displayed equation above, we obtain 


{x €U| f(y(x)) £0}" =o (fy € eU)| FO) £0}"). 


In other words, 
support(f ov) = g | (support(f)). 


Applying Proposition 2.38 a second time, we see that f og has compact support. 
As a result, we can rewrite the formula to be proved in Theorem 3.34 as 


[ rows F(Y@))| det g'(x)| dx, 
R" R" 


and the supports will take care of themselves in the proof. 

The result of Theorem 3.34 has to be regarded as preliminary. To understand 
the sense in which the result is limited, consider the case of polar coordinates in 
IR’. In this case we can take 


and we have 


vevow-{(j) ls) 


We readily compute that det g’ ef ) =r, and the desired formula is 


/ fx. ydxdy = | f(rcos6,rsin@)r dr dé. 
R2 O<r<oo, 0<0<27 


At first glance this formula seems fine. But if we refer to the precise hypotheses, 
we see that f is assumed to vanish in a neighborhood of the set of points (x, 0) with 
x > 0, as well as when (x, y) is sufficiently far from the origin. Without some 
sort of passage to the limit, the theorem therefore settles few cases of interest. 
This passage to the limit will be accomplished easily with the Lebesgue integral, 
and we therefore postpone the final form of the change-of-variables formula to 
Chapter VI. 
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In any event, we shall use Theorem 3.34 in proving the final change-of- 
variables formula, and thus a proof is warranted now. Before coming to the 
formal proof, it is well to understand the mechanism of the theorem. The proof 
will then flow easily from the analysis that is done for motivation. 

The motivation for the theorem comes from taking f to be the constant func- 
tion 1 and from thinking of ¢ as of the form g(y) = yo + L(y — yo) with L linear. 
In R°, if we take U to be the cube {y = ()1, 2, ¥3) | 0 < y; < 1 for all i}, 
along with f = 1, the formula asserts that g(U) has volume | det L|. This is just 
the well-known fact about 3-by-3 matrices that the volume of the parallelepiped 
with sides u, v, w is the scalar |(u x v) - w|. For a corresponding result in R”, 
where vector product is not available, the relationship between the determinant 
and a volume has to be argued differently. One way of proceeding in R” is to use 
row or column reduction to write the given matrix as the product of elementary 
matrices (those corresponding to the effect of a single step in the reduction), to 
check the change of variables for each factor, and to use the multiplication formula 
det(AB) = det A det B to obtain the result. This argument can be adjusted so as 
to work with a function f in place; the elementary matrices that interchange two 
variables are handled by Fubini’s Theorem (Theorem 3.32 or Corollary 3.33), and 
the other elementary matrices are handled by the one-variable change-of-variables 
formula (Theorem 1.34). 

That being the case, one can envision a proof of Theorem 3.34 that proceeds 
by approximation, using Taylor’s Theorem (Theorem 3.11), at least if f is of 
class C*. The contribution to the integrand from the integral remainder term in 
the Taylor expansion of ¢ is to be estimated as an error term. The approximation 
generates an additional error term because the image of U under does not 
match the image of U under the approximating first-order expansion of g. Of 
course, one cannot expect the approximation to be very good far away from the 
point where the Taylor expansion is centered, and thus the argument needs to be 
carried out locally. The local contributions can then be pieced together by using a 
partition of unity. Such an argument can actually be carried out, but the argument 
is lengthy. 

A more economical argument comes by finding a nonlinear analog of row or 
column reduction. The Inverse Function Theorem will allow us to prove that a 
general yg decomposes into suitably defined nonlinear elementary transformations, 
but the decomposition is valid only locally. A partition of unity is used to piece 
together the local results and obtain the theorem. We introduce two kinds of 
nonlinear elementary transformations: 


(i) a flip 6, which interchanges two coordinates. This is a linear function, 
and it satisfies | det B’(x)| = 1 for all x. Application of Fubini’s Theorem 
in the form of Corollary 3.33 shows that Theorem 3.34 is valid when @ 
is a flip. 
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(ii) a primitive mapping 


xX] 


Xj-1 
W(X1,---5Xn) = B(X1,---5Xn) ’ 
Xi+1 


Xn 


where g is real-valued and occurs in a single entry. If that entry is the i® 
entry, then the Jacobian matrix of y is the identity matrix except in the i 
row, where the entries are ze, sents ze. Hence | det y’(x)| = | Z|. To 
prove Theorem 3.34 for a primitive mapping of this kind, it is enough to 
handle i = 1. If we write x = (x1, x’) and y = (y,, x’) with x’ in R’~!, 
Fubini’s Theorem (Corollary 3.33) reduces matters to showing that 


i i forx)dyi | dx’ 
Rr! R 
= f(g@i.2"), x) 
Ri! R 


under suitable hypotheses on g, and it is enough to prove that the inner 
integrals are equal for all x’. Theorem 1.34 yields the equality of the inner 
integrals if gisaC ' function for which g(x1, x’) is defined for x; in an 
interval for any relevant x’, and if | it (x1, x’)| is everywhere positive at 
the points in question. 


0 
ee x’) dx,| dx’ 
Ox] 


In the linear case a primitive mapping y for which g(x) appears in the i 
entry is given by a matrix that is the identity except in the i" row. For w’ to be 
nonvanishing, the diagonal entry in the i“ row must be nonzero. This kind of 
matrix is not always elementary but is the product of n elementary matrices. 

What needs to be proved for Theorem 3.34 is that apart from translations, any 
nonlinear g as in Theorem 3.34 can be decomposed into the product of primitive 
transformations and flips, at least locally. The argument will peel primitive 
mappings from the right side of g and flips from the left side. In that sense 
it will be a nonlinear version of column reduction with primitive mappings and 
row reduction with flips. The decomposition will be forced to be local because it 
uses the Inverse Function Theorem, which guarantees the existence of an inverse 
function only locally. 
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Lemma 3.35. Suppose that E is an open neighborhood of 0 in IR” and that 
gy : E — R" isaC! function such that g(0) = 0 and g’(0)~! exists. Then there 
is a subneighborhood of 0 in R” in which ¢ factors as 


p= B10--- 0 Bp-19 Wn Oo OW, 
where each £; is a flip or the identity and each y; is a primitive C! function in 
some open neighborhood of 0 such that y;(0) = 0 and w’(0)~! exists. 


PROOF. Let us set up an inductive procedure by assuming at the start that 


xX] 
XNig-1 
M dudes = * 
p(x, Xn) Pig (X1, a cay ( ) 
Qn(X1, dived ,Xn) 


with 1 < ig <n. We shall make use of the following formula for multiplying two 
matrices A and B when B has the property that it is equal to the identity matrix 
except possibly in row io. The formula is 

(AB); = Aiip Bini + Aij Bij if j F io, 
Aji, B 
It will be convenient to identify linear functions like gy’ (x) with their matrices, so 
that the (i, j)" entry gy’ (x)i; of v(x) is meaningful. 

Let j = jo be the least row index for which the (j, io)" entry of y’(O) is 
nonzero. The index jg exists because g’(0) is nonsingular, and jp is > ig since 
the top ig — 1 rows of g/ (x) match the corresponding rows of the identity matrix. 
Let 


40k 
if j = io. ie 


igio 


yn 


r identity function if jo = io, 
= flip of entries jo and ig if jo > io. 
Then £;, 0 g has the general form of (+) except that the ie and hw entries have 
been interchanged. By inspection the Jacobian matrix at 0 of Bj, o g equals the 
identity matrix in rows 1 through ig — 1 and has (io, ig) entry nonzero. 
Thus if we possibly incorporate a composition with a flip into the definition of 
g, we may assume that ¢’(0);,;, is nonzero. Put 


xX} 
Xig-1 
W(X1,---,Xn) = Pig (X1, ++, Xn) 


Xin +1 


Xn 
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Then y’(x) is an n-by-n matrix with 


voua{ Kite 
Q(X )ig j ifi = io, 

where 6;; is the Kronecker delta. Since det w’(0) = 9'(0)iji, 4 0, we can 
apply the Inverse Function Theorem (Theorem 3.17) to y, obtaining a C! inverse 
function w—! that carries an open neighborhood of 0 onto an open subset of the 
domain of g, has y~!(0) = 0, and has derivative (W~!)’(y) = w/(x)7!, where 
x and y are related by y = w(x) andx = w7!(y). Using (*), we readily verify 
that 


Oij ifi F io, 
WR" )ij = 4 —@' Mion) 'C' Wii if i = in Fj, 
(9' (X)ioig) ifi = j =io. 
Therefore 
bij if i A io, 
(WYO) = 4 —O@ iin 'O'WOing if i = in F fj, 
(9! (X)ioio) ifi = j =io. 


Form n = go y!. By the chain rule (Theorem 3.10), we have n/(0) = 
g'(0)(w—!) (0), and this is nonsingular. Combining the formula for (wy Oni 
with the chain rule and (x) gives 


n'(x)ip =O CWO); 
| O' (X)iin VY ON ing FE OUT YO if j F io, 


G' (X)iin WY Onin if j = io, 
_ | 0" (X)iin (—@' @)inin) OM + 9 ij if j # io, 
Q' (X)iin G' )ioig) | if j = io. 


Since g’(x)jj, is 0 for i < ig, the above formula shows that n’(x);; = 4;; for 
i < ig. Fori = io, the formula shows first that n’(x)j,; is 0 for 7 # io and then 
that 7’(x)j,; is 1 for 7 = ip. Thus n’(x);; = 6;; for i < ip. Consequently the is 
entry of n(x) is x; + c; if i < ig, where c; is a constant. Evaluating n at x = 0, 
we see that c; = 0. Thus n(x) has the same general shape as () except that the 
i” entry is now Xi. 

Following this argument inductively fori = 1,...,—1 leads us to a decom- 
position 


N= Bn-19--:0Biogop oom, (+) 
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where each f; is a flip or the identity and where each y; is primitive. The function 
7 has n(0) = 0 and n’(0) nonsingular, and 7 has the form 


x] 


n(x1, ty KR) am 
Xn-1 


E(x1,...5Xn) 


Therefore 7 is primitive. Solving (+) for g thus exhibits g as decomposed into 
the required form. 


PROOF OF THEOREM 3.34. We are to prove that 
i f(y) dy =i f (@(x))| det g'(x)| dx (x) 
eu) U 


whenever y : U — g(U) is a C! function between open sets with a C! inverse 
and f : ¢(U) — R is continuous and has compact support lying in g(U). In the 
argument we shall work with several functions in place of g, and the set U may be 
different foreach. We have seen that («) holds if ¢ is a flip or an invertible primitive 
function. Let us observe also that () holds if g is a translation g(x) = x + x9 for 
some Xo in IR”; the reason is that (+) in this case can be reduced via successive 
uses of Fubini’s Theorem (Corollary 3.33) to the 1-dimensional case, where we 
know it to be true by Theorem 1.34. 

If (x) holds when ¢ is eithera : U > a(U) or 8B : a(U) > B(a(U)), then 
(*) holds when ¢ is the composition y = Boa: U — B(a(U)) because 


foue= is F(BO))|det B’(y)| dy 


R" 


= le Ff (B(o(x))| det B'(or(x))|| det a” (x)| dx 


II 
> 


f(y (x))| det(B' (a(x) or’ (x))| dx 


R 


= [ f (v(x))| det y'(x)| dx, 


the last two steps holding by the formula det(B A) = det B det A and the chain 
rule (Theorem 3.10). 

For any a in the given set U, Lemma 3.35 applies to the function gq carrying 
U —a to g(U) — g(a) and defined by ga(x) = g(x + a) — g(a) because 
Ya(0) = 0 and g/,(0) = g’(a). The lemma produces an open neighborhood E, of 
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0 on which ¢g, factors as a composition of flips and invertible primitive functions. 
If t,, denotes the translation t,,(x) = x + xo, then @g = T_gia) 0 Y O Tq shows 
that Y = Tq) 0 Ya O Tq. Therefore g factors on the open neighborhood E, +a 
as the composition of translations, flips, and invertible primitive functions. From 
the previous paragraph we conclude for each a € U that («) holds for g if f is 
continuous and is compactly supported in the open neighborhood g(E, + a) of 
g(a). 

As a varies through U, the subsets V, = g(E£,+a) of y(U) form an open cover 
of g(U). Fix f continuous with compact support K in g(U). By compactness 
a finite subfamily of the family {V,} forms an open cover of K. Applying 
Proposition 3.14, we obtain a finite family ¥ = {yw} of continuous functions 
defined on g(U) and taking values in [0, 1] with the properties that 


(i) each y is 0 outside of some compact set contained in some Vz, 
(ii) Doyew W is identically 1 on K. 


Then property (i) and the conclusion of the previous paragraph show that (*) holds 
for wf. From (ii), we have yy wf = f on g(V). Since there are only finitely 
many terms in the sum, we can interchange sum and integral and conclude that 
(*) holds for f. This completes the proof. 


One final remark is appropriate: Theorem 3.34 immediately extends from the 
scalar-valued case as stated to the case that f takes values in R” or C”. 


11. Problems 


1. Let F be Ror C. Prove that the Hilbert-Schmidt norm satisfies 
(a) |TS| < |T||S|if S isin L(F", F”) and T is in L(F”, F*), 
(b) |1| = ./n ifn =m and 1 denotes the identity function on F". 

2. Suppose that f : R” —> R” is a linear function with Jacobian matrix A. What is 
f' (x0)? 

3. Suppose that f : R* > R! has |f(x)| < |x|? for all x. Prove that f is 
differentiable at x = 0. 

4. Let x = (,...,%,) and u = (u,...,u,) be in R”. For f : R’ ~ R 
differentiable at x, use the chain rule to derive a formula for £ f@t+tu) haere 


5. Compute exprtX from the definition for X = (; ae (| ts ( e ae ca 


0-1 01 -10 io 
01 
and (?}). 
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It was observed in Section 6 in the context of polar coordinates that the Implicit 
Function Theorem implies the Inverse Function Theorem. Namely, the pair of 
polar-coordinate formulas (u, v) = (r cos @,r sin@) was inverted by applying 
the Implicit Function Theorem to the system of equations 


rcosé—u=O, rsind—v=0. 


Using this example as a model, derive the Inverse Function Theorem in the 
general case from the Implicit Function Theorem in the general case. 


; N : : : 
Define if i to mean limy_so J ; When the integrand is continuous. Prove or 
disprove: 


[ [foe 2 2e) dx i I [fer — 2¢72*7) dy| ay 


Problems 8—9 use Fubini’s Theorem to supplement the theory of Fourier series as 
given in Section I.10. 


8. 


Let f and g be continuous complex-valued periodic functions of period 27, and 
define their convolution to be the function 


1 as 
f *g(x) = ~| f(x —Neg(t)dt. 
W Jon 


(a) Show that f * g is continuous periodic and that f * g = gx f. 

(b) Let f(x) ~ Oe cnei™ and g(x) ~ Ye oo dne’™. Prove that 
feo y. ede. 

(c) Prove that the Fourier series of f * g converges uniformly. 

Let f, g, and h be continuous complex-valued periodic functions of period 27. 

Prove that f « (g *h) = (f * g) *h. 


Problems 10-13 deal with homogeneous functions. If f : R’ — {0} ~ Risa function 
not identically 0 such that f (rx) = r4 f (x) for all x in R” — {0} and all r > 0, we say 
that f is homogeneous of degree d. For example, the function in the first problem 
below is homogeneous of degree 0. 


10. 


On R?, define 
xy ; 
F( ) x+y if (x, y) 4 (0,0), 
x,y= 
0 if x, y) = (0,0). 


Prove that af and zz exist everywhere in R? and that f is not continuous at (0, 0). 


11. 


12. 


13. 
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Let f : R” — {0} — R be smooth and homogeneous of degree d. 

(a) Prove that if d = 0, then f(x) is bounded on R” — {0} and that f extends 
to be continuous at 0 only if it is constant. 

Prove that if d > 0, then the definition f (0) = 0 makes f continuous for all 
x in R", es ifd < 0, then no definition of f(0) makes f continuous at 0. 


(b 


ma 


(c) Prove that 26 is homogeneous of degree d — | unless it is identically 0. 
(d) If f is eee of degree | and satisfies f(—x) = — f(x) and f(0O) = 
0, prove that each ab exists at O but that af ij is not continuous at 0 unless it 


wm 


is constant. 


On R?, let f be the function homogeneous of degree 1 given by 


3 
Bios 2 fey) #00), 
0 if (x, y) = (0,0). 
(a) Prove that J is eres at (0,0). 
(b) Prove that 2 Land & exist at (0, 0) but are not continuous there. 
(c) Calculate 4 a i [es _oforx = Oandu = ( ). Show that the formula 
in Problem 4 fails, and conclude that f is not differentiable at (0, 0). 


On R?, let f be the function homogeneous of degree 2 given by 


2152 
fay)= er if (x, y) # 0,0), 


0 if (x, y) = (0,0). 
(a) Prove that fa A a ‘ and Fy F are continuous on all of R?. 
(b) Prove that 2 
(c) Prove that 24 


and 2F ae at a 0) but are not continuous there. 


£0, ie S a and 2£(0,0) = 


7 


axdy ay ae 


Problems 14-15 concern “harmonic functions” in {(x, y) € R?2 | (x, y)| < 1}, the 
open unit disk of the plane. A harmonic function is a complex-valued C? function 
sanstyile the Laplace equation Au(x, y) = 0, where A is the Laplacian A = 
a2 

+4 
Ox 


14. 


15. 


If a 8) are regarded as polar coordinates, prove for all integers n that each 
function r!"le'"? is a C® function in the open unit disk and is harmonic there. 
Deduce that if {c,} is a doubly infinite sequence such that }°°° 4, car!!e'"? 
converges absolutely for each r with 0 <r < 1, then the sum is a C™ function 
in the open unit disk and is harmonic there. 


Prove that if u is harmonic in the unit disk, then so is the function u o R, where 
: F er 3 x cos@ —siné x 
R is the rotation about the origin given by ( ) re ( nak ieee ;) ( - 
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Problems 16-20 illustrate the Inverse and Implicit Function Theorems. 


16. 


yg 


18. 


19. 


20. 


Verify that the equations u = x*y + x and v = x + y° define a function 


from IR* to R? whose derivative at (1, 1) is given by the matrix Ce A). This 
matrix being invertible, the Inverse Function Theorem applies. Let the locally 
defined C! inverse function be given by x = F(u,v) and y = G(u, v) in an 
open neighborhood of (u, v) = (2, 2), the point (2, 2) having the property that 
F (2,2) = 1 and G(2, 2) = 1. Find oF (2, 2). 
Show that the equations 
r= ycos(uv) + 2 =0, 

pes y? — sin(uv) + 227 =2, 

xy —sinucosu+z=0, 


implicitly define x, y, z as C! functions of (u,v) nearx =1l,y=1l1,u=7/2, 
v = 0, and z = O, and find oe and oe for the function x(u, v). Is the function 
x(u, v) of class C°? 


Regard the operation of squaring an n-by-n matrix as a function from R” toR”™ ; 
and show that this mapping is invertible on some open set of the domain that 
contains the identity matrix. 


(Lagrange multipliers) Let f and g be real-valued C! functions defined on an 
open subset U of R”, and let S = {x eU | g(x) = Oo}. Prove that if Fils has a 
local maximum or minimum at a point xo of S, then either g’(x9) = 0 or there 
exists a number A such that f’(xo) + Ag’(xo) = 0. 


(Arithmetic-geometric mean inequality) Using Lagrange multipliers, prove 
that any n real numbers a), ..., a, that are > 0 satisfy 


z ay 4 a24++++ tan 
a\a2°++Ayn < ‘A ; 


CHAPTER IV 


Theory of Ordinary Differential Equations and Systems 


Abstract. This chapter treats the theory of ordinary differential equations, both linear and nonlinear. 
Sections 1-4 establish existence and uniqueness theorems for ordinary differential equations. 
The first section gives some examples of first-order equations, mostly nonlinear, to illustrate certain 
kinds of behavior of solutions. The second section shows, in the presence of continuity for a vector- 
valued F satisfying a “Lipschitz condition,” that the first-order system y’ = F(t, y) has a unique 
local solution satisfying an initial condition y(t9) = yo. Since higher-order equations can always be 
reduced to first-order systems, these results address existence and uniqueness for n'*-order equations 
as a special case. Section 3 shows that the solutions to a system depend well on the initial condition 
and on any parameters that are present in F’. Section 4 applies these results to existence of integral 
curves for a vector field and to construction of coordinate systems from families of integral curves. 
Sections 5—8 concern linear systems. Section 5 shows that local solutions of linear systems may 
be extended to global solutions and that in the homogeneous case the vector space of global solutions 
has dimension equal to the size of the system. The method of variation of parameters reduces the 
solution of any linear system to the solution of a homogeneous linear system. Sections 6—7 identify 
explicit solutions to n'"-order linear equations and first-order linear systems. The “Jordan canonical 
form” of a square matrix plays a role in the case of a system. Section 8 discusses power-series 
solutions to second-order homogeneous linear equations whose coefficients are given by convergent 
power series, as well as solutions that arise in the case of regular singular points. Two kinds of special 
functions are mentioned that result from this study — Legendre polynomials and Bessel functions. 


1. Qualitative Features and Examples 


To introduce the subject of ordinary differential equations, this section gives 
examples of some qualitative features and complicated phenomena that can occur 
in such equations. 

If F is a complex-valued function of n + 2 variables, a function y(t) is said to 
be a solution of the ordinary differential equation 


F(t,y,y,y",...,y™) =0 
of m"™ order on the open interval (a, b) if 


FG, y@,y'@,..., yO) =0 
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identically fora < t < b. The equation is “ordinary” in the sense that there is 
only one independent variable. The equation is said to be linear if it is of the 
form 


Am (thy + dm(t)y"—? +++» tay +ao(t)y = q(0), 


and it is homogeneous linear if in addition, g is the 0 function. A linear ordinary 
differential equation has constant coefficients if a,,(t), ..., ao(t) are all constant 
functions. 

Let us come to examples, which will point toward the enormous variety of 
phenomena that can occur. We stick to the first-order case, and all the examples 
will have F real-valued. Let us look only for real-valued solutions. Pictures 
indicating the qualitative behavior of the solutions of each of the examples are in 
Figure 4.1. 


EXAMPLES. 


(1) Simple equations can have relatively complicated solutions. This is already 
true for the equation 


y =1/t on the interval (0, +00). 


Integration shows that all solutions are of the form logt + c; on an interval 
of negative t’s, the solutions are of the form log |t| +c. The c comes from a 
corollary of the Mean Value Theorem that says that a real-valued function on 
an open interval with 0 derivative everywhere is necessarily constant.' Another 
example, but with no singularity, is y’ = ty. To solve this equation on intervals 
where y(t) 4 0, write y’/y = ft, so that log|y| = st? +a and |y| = efet /2, 


Thus y(t) = ce’ /2, with c 0 constant, on any interval where y(t) is nowhere 
0. The function y(t) = 0 is a solution as well, and all real solutions on an interval 
are of the form y(t) = ce! /2 with c real. See Figures 4.1a and 4.1b. 


(2) Solutions may not be defined on obvious intervals. For the equation 

ty +y=sint, 
we can recognize the two sides as g (ty) and £ (—cost). Thereforety = c—cost. 
Dividing by t, we obtain y(t) = —*** on any interval that does not contain 
0. What about intervals containing t = 0? If we put t = O in the formula 
ty = c—cost, we see that c must be 1. In this case we can define y(0) = 0 there, 
and then y’(0) exists. We obtain the additional solution 

1 —cost for? £0 

—— or ; 

y= t 

0 fort =0, 

on any open interval containing 0. Figure 4.1c shows graphs of some solutions. 


'See Section A2 of the appendix for further information. 
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(a) 


(c) 


FIGURE 4.1. Graphs of solutions of some first-order ordinary differential 
equations: (a) y’ = 1/r, (b) yw =ty, (c)ty’+y =sinf, 
@y=y+l@y=y, Oy =y. 


(3) Even if the equation seems nice for all t, the solutions may not exist for all 
t. An example occurs with 


y=y4+, 


which we solve by the steps 4 (arctan y) = l,arctany =t+c,y =tan(t +c). 
The solutions behave badly when ¢ + c is any odd multiple of 2/2. Solutions 
are defined at most on intervals of length 7. Figure 4.1d shows graphs of some 
solutions for this example. 


(4) Some solutions may look quite different from all the others. For example, 
with 
0 ee ae 


1 
we solve by —1/y =t +c for y £0, so that y(t) = er Also, y(t) = 0 is 
c 
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a solution. Here the solutions of the form y(t) = — x are not defined for all r, 


but the solution y(t) = 0 is defined for all t+. We might think of y(t) = 0 as the 
limiting case with c tending to oo. Figure 4.le shows graphs of some of the 
solutions for this example. 


(5) New solutions can sometimes be pieced together from old ones. For 
example, the equation 
yay 
is solved where y 4 0 by the steps y~?/y’ = 1, 3y!/8 = t+ c, and y(t) = 
x(t +c)*. But also y(t) = 0 is a solution. In fact, we can piece solutions of 
these types together. For example, the function 


x(t + 1)3 fort < —1, 
y(t) = 40 for —1 <t <0, 


1 3 
ait forO <f, 


is a solution on (—oo, +00). Figure 4.1f shows graphs of some of the solutions 
for this example. 


One thing that stands out in the above examples is that the set of solutions seems 
to depend, more or less, on a single parameter c. The inference is that nothing 
much worse than the c occurs because somewhere an integration is taking place 
and the Mean value Theorem is controlling how many indefinite integrals there 
can be. One way of trying to quantify this statement about how the number of 
solutions is limited is to say that for any fixed t = fo and given real number yo, 
there is only one solution y(t) near fo with y(to) = yo. This statement is not quite 
accurate, however, as Example 5 shows. The uniqueness theorem in Section 2 
will give a precise result. The data (fo, yo) are called an initial condition. 

Something else that stands out, although perhaps not without the visual aid of 
the graphs of solutions as in Figure 4.1, is that the graphed solutions appear to fill 
the entire part of the plane corresponding to the t’s under study. In the framework 
of the previous paragraph, the statement is that for any fixed t¢ = f and given 
real number yo, there exists a solution y(f) near fo with y(t) = yo. The existence 
theorem in Section 2 will give a precise result. 


WEAK VERSION OF EXISTENCE AND UNIQUENESS THEOREMS. Let D be a 
nonempty convex open set in R?, and let (f, yo) be in E. If F: D > Risa 
continuous function such that RF (t, y) exists and is continuous in D, then for 
any sufficiently small open interval of t’s containing fo, the equation y’ = F(t, y) 
has a unique solution y(t) with y(to) = yo such that the graph of t > y(t) lies 
in D. 
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An improved theorem, together with a proof, will be given in Section 2. The 
proof of existence uses “Picard iterations,’ and the idea is as follows. First we 
convert the differential equation into an equivalent integral equation 


y(t) = / F(s, y(s)) ds + yo. 


to 


Second we use the right side as input and the left side as output to define successive 
approximations to a solution: 


yo(t) = yo, 


yi) = / F(s, yo(s)) ds + yo, 


to 


care / CONE eee 


Third we use the Weierstrass M test to show that the series with partial sums 
yn(t) = yo ee; (yn (t) — Yn—-1(¢)) is uniformly convergent. If the limiting 
function is denoted by y(t), we check that y(t) satisfies the integral equation from 
which we started. Hence y(f) is a solution of the differential equation. 


2. Existence and Uniqueness 


In this section we state and prove the main existence and uniqueness theorems for 
solutions of ordinary differential equations. First let us establish an appropriate 
setting more general than the one in Section 1. 

The examples in Section 1 were all of the first order. They could all have 
been written in the form y = F(t, y) with F real-valued, and we considered 
real-valued solutions y(t). From equations as simple as y” + y’ + y = 0, whose 
real-valued solutions are 


y(t) = aye"? cos(tV3/2) + ae"? sin(t 3/2), 
we know that it can be easier to work, at least initially, with complex-valued 


solutions. In this particular case, it is easier as a first step to find all complex- 
valued solutions, namely 


y(t) = cr exp (4(-1 +iV3)t) +. c2 exp (4(-1 —iv3)z2), 
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and then to extract the real-valued solutions from them. The solution method, 
which will be discussed in more detail in Section 6 below, involves finding all 
complex solutions of a certain polynomial equation with real coefficients, and the 
method is more natural if the coefficients of the polynomial equation are allowed 
to be complex. 

Thus right away, it is natural to consider first-order equations y’ = F(t, y) 
with F complex-valued and to look for complex-valued solutions. The theory in 
Chapter III avoided working with functions of several variables in which some of 
the variables are complex, and we can update the theory of Chapter III here. The 
technique, which is to consider the complex variable y as two real variables Re y 
and Im y,is again applicable. Thus we have only to think of F (f, y) as a function of 
three real variables, even if we do not separate y into its two components in writing 
F(t, y), and the theory of Chapter III applies directly. In adopting the point of 
view that y is actually two real variables, we need to apply the same consideration 
to y’, and we are led to view y’ = F(t, y) as a system of two simultaneous 
equations, namely Re y’ = Re F(t, y) and Imy’ = Im F(t, y). This viewpoint 
merely makes our functions conform to the prescriptions of Chapter III. It is not 
necessary to work with the expanded notation; all we have to remember is that in 
this part of the theory we never differentiate a function with respect to a complex 
variable. 

The utility of allowing y’ = F(t, y) to represent a system of ordinary dif- 
ferential equations has, in any event, been thrust upon us. Let us consider the 
notion of a system a bit more. With a little trick the second-order equation 
y”+y’+y = 0can itself be transformed into a system, quite apart from the issue 
of real vs. complex variables. The trick is to introduce two unknown functions 
u; and up to play the roles of y and y’. Then uw, and wz satisfy uw. = u', and 


uy =u = y” = —y! — y = —u2 — yy. In other words, uy and up satisfy the 
system 
/ 
uj = U2, 
Uy = —Uy — Ud. 


Conversely if u;(¢) and u2(t) satisfy this system of equations, then y(t) = u(t) 
is a solution of y’ + y’ + y = 0. In this way, the given second-order equation is 
completely equivalent to a certain system of two first-order equations with two 
unknown functions. 

Let F be a function defined on an open set D of R x C*” and taking values 
in C*. A Cé-valued function y(t) = (yi (t),..., yg(t)) is said to be a solution 
of the system F(t, y, y’,..., y™) = 0 of k ordinary differential equations of 
order m in the open interval (a, b) if F(t, y(t), y'(),..., y™) =0 identically 
fora<t<b. 

We saw that the single second-order equation y” + y’ + y = 0 is equivalent to 
a certain first-order system of two equations, and the technique for exhibiting this 
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equivalence works more generally: a system of k equations of order m that has 
been solved for the m'"-order derivatives is equivalent to a system of km equations 
of first order. 

We shall consider first-order systems of the form y’ = F(t, y), where F is 
continuous on an open subset D of R x C” and takes values in C”. The example 
y’ = y’/? in Section 1 fits these hypotheses, and we saw that the hoped-for 
uniqueness fails for this equation. In the weak theorem stated at the end of 
Section 1, an additional hypothesis was imposed in order to address this problem: 
for y’ = F(t, y) with only real-valued solutions of interest, the hypothesis is 
that dF /dy exists and is continuous on the domain D of F. Generalizing this 
condition presumably means saying something about partial derivatives in each 
of the directions y; for 1 < j <n. In addition, we must remember the injunction 
against differentiating with respect to complex variables. Thus we really expect 
a condition concerning 2n first-order derivatives. Fortunately there is an easily 
stated less-stringent condition that is nevertheless good enough. The condition is 
that F satisfy a Lipschitz condition in its y variable, i.e., that there exist a real 
number k such that 


|F(t, yi) — F(t, y2)| < klyi — yal 


for all pairs of points (¢, y;) and (t, y2) in the domain D of F. 
If F is areal-valued continuous function of two real variables with a continuous 
partial derivative in the second variable, then the Mean Value Theorem gives 


OF 
F(t, y1) — F(t, y2) = (1 — Y2) ay §) 


with € between y, and y», provided the line segment from (f, y;) to (¢, yz) lies in 
the domain D of F. The partial derivative is bounded on any compact subset of 
D, and thus F satisfies, on any compact convex subset of D, a Lipschitz condition 
in the second variable. 


Theorem 4.1 (Picard—Lindeléf Existence Theorem). Let D be a nonempty 
open set in R! x C", let (to, yo) be in D, and suppose that F : D > C” is 
a continuous function such that F(t, y) satisfies a Lipschitz condition in the y 
variable and has |F(t, y)| < M on D. Let R be acompact set in R! x C” of the 
form 

R = {(t,y)||t — to] < @ and ly — yo| < 5}, 
and suppose that R is contained in D. Put a’ = min{a, b/M}. Then there exists 
a solution y(t) of the system 
y =F(t,y) 


on the open interval |t — to| < a’ satisfying the initial condition 


y(to) = yo. 
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REMARKS. A variant of Theorem 4.1 takes D to be in R! x C” but insists 
only on continuity of F’, not on the Lipschitz condition. Then a local solution still 
exists for |t —to| < a’. This better result, known as the “Cauchy—Peano Existence 
Theorem,” appears in Problems 20-25 at the end of the chapter and is proved by 
an argument using Ascoli’s Theorem. However, Example 5 in Section 1 shows 
that there is no corresponding uniqueness theorem, and within the text we omit the 
proof of the better existence theorem. Another variant of Theorem 4.1 assumes 
that the domain D of a given Fg lies in R'! x R”, Fp takes values in R”, and yo is 
in R”. Then y’ = F(t, y) has a solution y(t) such that y(f9) = yo and the range 
of y is R”. In fact, when FR satisfies a Lipschitz condition in the y variable, this 
variant is a consequence of Theorem 4.1 as stated. To derive this variant, one 
extends the given function Fg from the subset of R! x IR” to a subset of R! x C” 
by making it constant in Im y. Specifically the new system is y’ = F(t, y) with 
F(t, y) = Fr(t, Rey), and the initial condition remains as y(fo) = yo. The part 
of the system corresponding to equations for Im y’ is just Im y’ = 0, since F is 
real-valued, and therefore Im y(t) is constant. Since yo is real, Im y(t) must be 
0. Thus Theorem 4.1 yields a solution y(t) with range R” under these special 
hypotheses. 


PROOF. The first step is to see that the set of differentiable functions t +> y(t) 
on |t — fo| < a’ satisfying y’ = F(t, y) and y(to) = yo is the same as the set of 
continuous functions t + y(t) on |t — fo| < a’ satisfying the integral equation 
yQ) = fi, FG, y(s)) ds + yo. 

If y is differentiable and satisfies the differential equation and the initial con- 
dition, then y is certainly continuous and hence s +» F‘(s, y(s)) is continuous. 
Then i F(s, y(s)) ds is differentiable by the Fundamental Theorem of Calculus 


(Theorem 1.32), and the differential equation shows that y(t) and i F(s, y(s)) ds 
have the same derivative for |t — to| < a’. Thus they differ by a constant. The 
constant is checked by putting tf = fo, and indeed y satisfies the integral equation. 

Conversely if y is continuous and satisfies the integral equation, then 
s +» F(s, y(s)) is continuous, and the Fundamental Theorem of Calculus shows 
that i F(s, y(s))ds is differentiable. This function equals y(t) — yo by the 
integral equation, and hence y is differentiable. Differentiating the two sides of 
the integral equation, we see that y satisfies the differential equation. Also, if 
we put tf = fo in the integral equation, we see that y satisfies the initial condition 
y(to) = yo- 

Thus it is enough to prove existence for a continuous solution of the integral 
equation. For to — a’ < t < fo +a’, define inductively 


yo(t) = yo, 
yi) = w+ | F(s, yo(s)) ds, 


to 
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nit) = 0+ f F(s, yn_1(s)) ds, 


to 


with the usual convention that i — rie Let us see inductively that the graph 
of y,(t) lies in the set 


R' = {(t, y) | It — | <a’ and |y — yo| < D}, 


for |tf — t| < a’. The graph of yo(t) = yo is just {(t, yo) | lt — tol < a’}, 
and this lies in R’. The inductive hypothesis is that (t, y,_,(t)) lies in R’ for 
{(t, yo) | It — tol < a’}. Then 


t 
lyn(t) — yo| = [| F(s, yn-1(6) ds| < M|t — to| < Ma’ <b, 
1 


and therefore (t, yn (t)) lies in R’ for |t — to| < a’. This completes the induction, 
and hence the graph of y,(t) lies in R’ for |t — fo| < a’. 
Now write 


N 
yn (t) = yolt) + ¥> Lyn = yn 9 


n=1 
for N > 0. We shall use the Weierstrass M test (Proposition 1.20), adapted to a 
series of functions with values in C”, to prove uniform convergence of this series. 
Thus we are to bound |y,(¢) — yn—1(¢)|, and we shall do so inductively forn > 1. 
We start from the inequality | F(t, y)| < M on R’ and the Lipschitz condition 
IF, y@) -— FO y-pI < klyjQ) —yj1@| for j = 1. 


Say that t9 < x < +a’ for definiteness. Then 
t 
in) = yot1=| fF. 9066) ds| = MO 0) 
to 
and 
t 
lyo(t) — yi @)| = [| [F(s, yi(s)) — F(s, vo(s))]ds| 
to 


al IF (s, yi(s)) — F(s, yo(s))| ds 


to 
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t 
< / kKlyi(s) — yo(s)| ds 
i) 


t 
< / kM(s —1to)ds_ from the previous display 


to 

_ Mkt —t)? 

- 2! 
Now we carry out an induction. The base case is the estimate carried out above for 
|v, (t) — yo(t)|. The estimate for | y2(t) — y; (t)| suggests the inductive hypothesis, 
namely the inequality 

Mk"~7(t = to)?! 
(n—1)! 


lyn—-12) — Yn-2/) | S 
Then we have 


Ivnt) — Yn-1 7) < | IF (s, Yn—1(8) — F(t, Yn—2(s))| as 


< | k\yn—1(8) — Yn—2(s)| ds 


ut) 


i t (s _ to)! 
< Mk"~ —— ds by inductive hypothesis 
 (n—1)! 
—_ Mk""!(t — 19)" 


i 


n! 
and the induction is complete. The argument when fp — a’ < t < fo is completely 
similar, and the form of the estimate for the two cases combined is 


Mk""||t — to|" 
Yn (t) — Yn-1(2)| S$ ——~,———_ for |t ~ tol Sa’. 
There is no harm in assuming that k is > 0, and consequently 


M k” \n 
Ia — wal s TO 
nN: 


independently of t. Since °°, (n!)~!k"(a’)” = e* is finite, the M test applies 
and shows that our series converges uniformly. 

Thus yy (t) converges uniformly for |t — fo| < a’, necessarily to a continuous 
function. We call this function y(t). For |t — fo| < a’, we have 


/ F(s.y(0))ds = f [F(s, y(s)) — F(t, yn(s))] ds +f F(s, yy(s)) ds 


to to to 


= i [F(s, y(s)) — F(s, yw(s))] ds + ynai(t) — yo. 


2. Existence and Uniqueness 193 


On the right side, we have limy[yn+1(t) — yo] = y(t) — yo. Because of the 
Lipschitz condition the absolute value of the first term on the right side is 


<ak sup |y()—yw@I, 


|t—fo|<a’ 


and this tends to 0 as n tends to infinity. Thus 


‘i F(s, y(s)) ds = y(t) — yo, 


to 


and y(t) is a continuous solution of the integral equation. 


Theorem 4.2 (uniqueness theorem). Let D be a nonempty open set in R! x C”, 
let (t9, yo) be in D, and suppose that F : D > C” is acontinuous function such 
that F(t, y) satisfies a Lipschitz condition in the y variable. For any a” > 0, 
there exists at most one solution y(t) to the system 


y = F(t, y) 


on the open interval |t — fo9| < a” satisfying the initial condition 


y(to) = yo. 


PROOF. As in the proof of Theorem 4.1, it is enough to prove uniqueness for the 
integral equation. Suppose that y(t) and z(t) are two solutions for |t — fo| < a”. 
Fix € > 0. Then | y(t) — z(f)| is bounded by some constant C for |t —to| < a” —e, 
and F is assumed to satisfy a Lipschitz condition | F(t, y1)—F (ft, y2)| < kly1—yo| 
on D. 

We argue as in the proof of Theorem 4.1, working first for fo < ¢ and starting 
from 

ly@) —z@|<C 


and from 
ly) —z@|= [| [F(s, y(s)) — F(s, z(s))] ds 


< | TAR ONE OONEL 


to 


< / kly(s) — z(s)| ds 


to 


< Ck(t — 0). 
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Inductively we suppose that 
Ck! (t 7 ty)? 
(n — 1)! 


ly) —z@| < 
Then 


Iy(t) — z(0)| < | IF (s, y(s)) — F(s, z(s))| ds 


1) 

t 
< / kly(s) — z(s)| ds 
to 

t _ n-1 n(- n 
< Ck" (s — Xo) i Ck" (t — to) . 
t) (n—1)! n! 

and thus |y(t) — z(t)| < C(n!)7!k"(t — to)” for all n. A similar estimate is valid 
for t < fo, and the combined estimate is 


Ck" |t — to|” 
ly@) —2@)| < ———. 
n! 
Since )°C(n !)~'k" |t — to|” converges, the individual terms tend to 0. Therefore 


y(t) = z(t) for |f—fo| < a” —e. Since € is arbitrary, y(t) = z(t) for |t—fo| < a”. 


3. Dependence on Initial Conditions and Parameters 


In abstract settings where the existence and uniqueness theorems play a role, it 
is frequently of interest to know how the unique solution depends on the initial 
data (to, yo) such that y(t9) = yo. To quantify this dependence, let us write the 
unique solution corresponding to y’ = F(t, y) as y(t, fo, yo) rather than y(t). 
We continue to use y’ to indicate the derivative in the t variable even though the 
differentiation is now actually a partial derivative. 


Theorem 4.3. Let D be a nonempty open set in R! x C”, let (t, y*) be in D, 
and suppose that F : D — C” is acontinuous function such that F(t, y) satisfies 
a Lipschitz condition in the y variable. Let R be a compact set in R! x C” of the 
form 

R={(,y)||t—| <aand|y — y*| <d}, 
suppose that R is contained in D, and let M be an upper bound for | F'| on R. Put 
a’ = min{a, b/M}. If |f — t*| < a’/2 and |yo — y*| < b/2, then there exists a 
unique solution t > y(f, fo, yo) on the interval |t — to| < a’/2 to the system and 
initial data 
y =F(t,y) and y(t, 0, Yo) = Yo, 
and the function (f, fo, yo) +> y(t, fo, yo) is continuous on the open set 
U ={(, 1%, yo) | It — tol < a'/2, | —t*| < a'/2, ly — y*| < b/2}. 
If F is smooth on D, then (tf, fo, yo) y(t, fo, yo) is smooth on U. 
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REMARK. It is customary to summarize the result about continuity qualitatively 
by saying that the unique solution depends continuously on the initial data. 


PROOF OF CONTINUITY. Let us first check that there is indeed a unique solution 
for each pair (fo, yo) in question and that its graph, as a function of f, lies in 


R’ ={(t, y)||t-—t| <a’ and |y — y*| <5}. 


For this purpose, fix fg and yo with |t9 — t*| < a’/2 and |yo — y*| < b/2. Use 
of the triangle inequality shows that the closed set with |t — fo| < a’/2 and 
ly — yo| < b/2 lies within R. Thus |F| < M on this set. Theorem 4.1 shows 
that there exists a solution with graph in this smaller set for |t — to| < a”, where 
a” = min{a'/2, (b/2)/M}. Now 


min{a’/2, b/(2M)} = 4 min{a’, b/M} = 5a’, 


and hence there exists a solution for |f — f9| < a’/2 with graph in R. This solution 
y(t, to, yo) is unique by Theorem 4.2, and it is the result of the construction in the 
proof of Theorem 4.1. 

The idea is to trace through the construction in the proof of Theorem 4.1 and 
to see that the function (t, to, yo) +> y(t, to, yo) is the uniform limit of explicit 
continuous functions on U. Imitating a part of the proof of Theorem 4.1, we 
define, for (t, fg, yo) in U, 


yo(t, to, Yo) = Yo; 


t 
yi(t, to, Yo) =n+ F(s, yo(s, to, yo)) ds, 


to 


t 

Ym(t, to, Yo) = Yo +f F(S, Ym-1(8, to, Yo)) ds. 
to 

We shall show by induction that y,(t, fo, yo) is continuous on U. Certainly 

yo(t, to, Yo) is continuous on U. 

For the inductive step we need a preliminary calculation. Let J; be the closed 
interval between fo and f, and let J) be the closed interval between 4 and 1’. 
Suppose we have two functions f; and f of a variable s such that 

(i) f\ is defined for s between fg and ¢ with | f;| < M there, 
(ii) fz is defined for s between ¢j and ¢’ with | f2| < M there, and 

(ii) | fi(s) — fo(s)| < € on their common domain. 

If a’ is > the maximum distance among 9, ¢, f), t’, let us show that 


[| fitsyas = frls) ds| < M(\t —t|+|t—c'|) +a’e. (+) 
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To show this for all possible order relations on the set {fo, is t, t'}, we observe 


that there is no loss of generality in assuming that fo is the smallest member of 
the set. There are then six cases. 


Case 1. to < t) < t’ <t,so that (iii) applies on [15, t’]. Then 


/ A@ae i Or I "Fidei / (fils) fols)) ds-+ / AGyas 


and hence 


t Oe 
[| fitsyds — | frls) ds| < Mt, — tol tele’ — | + Mit —1'l. 
to t 


Therefore () holds in this case. 
Case 2. to < t) <t <t’,so that (iii) applies on [f), t]. Then 


[ reas-f fro(s) ds = | fwyas+ f (fio) fas) as— | fr(s) ds, 


and hence 


t t 
[| fisyas— f frls) ds| < Mt, —to| telt— 1h] + M|t! — 11. 
to t 


Therefore () holds in this case. 
Case 3. t) <t < t' < 4%). Then 


t 
[| fils) ds| < M|t — tol < M(\to — tol — | —1#'1) 
to 


t’ 
and [| frls) ds| < Mt, —1', 
i 


so that (*) holds in this case. 
Case 4. t9 < t' < t) <t. Then 


t 
[| fils) ds| < M|t — t| = M(\t9 — tol + It — t91) 
to 


t’ 
and [| frls) ds| < M\t'—t)| = M(t —#'| —|t-— 4), 
% 


so that (*) holds in this case. 
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Case 5. t) <t < %) <t’. Then 
t 
[| fils)ds| < Ml tol = Mltg — 10 
to 
t! 
and || fr(s) ds| < Mt! — 11) < Mit’ — 11, 
to 


so that (*) holds in this case. 
Case 6. t) < t' <t < 4%). Then 


if 
[| fils) ds| < M|t — to| = M(t — tol — Ito — tl) 
to 
t’ 
and [| frls) ds| < Mi, —1'| = Me —t| +1 — th), 
i 


so that (*) holds in this case. 


With (*) proved we can now proceed with the inductive step to show that 
Yn(t, to, yo) IS continuous on U. Thus assume that y,_1(f, fo, yo) 1s continuous 
on U. If (¢, to, yo) and (t’, t), yo) are in U, then 


Yat, to, yo) — yall’, to, Yo) 


t ¢ 
= (yo — ¥) +/ F(s, yn-1(S, to, yo)) ds -| F(s, yn—-1(8, 1, Yo)) ds 
t ts 


0 


= (yo — Yo) +f fils) ds -{ fo(s) ds, 


where fi(s) = F(s, Yn—1(8, fo, Yo)) and f2(s) = F(s, Yn-1(S, 9, Yo)). Thus (*) 
gives 


lyn(t, fo, Yo) — Yall’, 1, Yo) S< l¥o — Yol + M(lto — Hl + lt —1') Hale (ex) 


if € is chosen such that | f;(s) — fo(s)| < € on the common domain of f; and fp. 
Let € > 0 be given, and choose some 6 > 0 for uniform continuity of F on 
the set R. By uniform continuity of y,_;, choose 7 > O such that 


lyn—1(8, to, Yo) — Yn—1(S, 9, Yo)| < 6 whenever |(s, fo, yo) — (8, {9 Yo)| < 7. 
Then |(s, to, yo) — (8, t9, ¥)| < 7 implies | fi(s) — fo(s)| < € on the common 


domain of f; and f2, and hence (**) holds. Therefore y, is continuous as a 
function on U. This completes the induction. 
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We know that y, (t, fo, yo) converges to a solution y(t, fo, yo) uniformly in ¢ if 
(to, yo) is fixed. Let us see that the convergence is in fact uniform in (f, f, yo). 
The proof of Theorem 4.1 yielded the estimate 

M k® (a’)" 


lyn (t, f0, Yo) — Yn—1(t, fo, Yo)| < — ; 
k n! 


and this is independent of (¢, to, yo). Therefore the Weierstrass M test shows 
that y,(t, to, yo) converges to y(¢, fo, yo) uniformly on U. The uniform limit of 
continuous functions is continuous by Proposition 2.21, and hence y(t, fo, yo) is 
continuous. 


PROOF OF SMOOTHNESS. Under the assumption that F is smooth on D, we are to 
prove that y(t, fo, yo) is smooth on U.. We return to the earlier proof of continuity 
of y(t, to, yo) and show that each y,(t, tf, yo) is smooth. This smoothness is 
trivial for n = 0, we assume inductively that y,_1(¢, to, yo) is smooth, and we 


form : 


Yn(t, to, Yo) = Yo +f F(s, Yn—1(S, fo, yo)) ds. 
i) 
The function on the right side is the composition of (t, fo, Yo) t> (f, fo, to, yo) fol- 
lowed by (f, fo, 50, Yo) > ih F (s, Yn_1(S, 50, Yo)) ds. The chain rule (Theorem 
3.10), the Fundamental Theorem of Calculus (Theorem 1.32), and Proposition 
3.28 allow us to compute partial derivatives of this function, and another argument 
with () allows us to see that the partial derivatives are continuous. There is no 
difficulty in iterating this argument, and we conclude that y,(t, fo, yo) is smooth. 
The same argument in the proof of Theorem 4.1 that enabled us to estimate 
the size of yn(t, to, Yo) — Yn—1(f, fo, yo) allows us to estimate any iterated partial 
derivative of this difference. New constants enter the estimate, but the qualitative 
result is the same, namely that any iterated partial derivative of y,(t, fo, yo) con- 
verges uniformly to that same iterated partial derivative of y(f, to, yo). Applying 
Theorem 1.23, we see that y(t, to, yo) is smooth. 


CONCLUDING REMARK. Sometimes a given system y’ = F(t, y) with initial 
condition y(t9) = yo involves parameters in the definition of F’, so that effectively 
the system is y’ = F(t, y,A,,...,A,). A natural problem is to find conditions 
under which the dependence of the solution on the k parameters is continuous or 
smooth. The answer is that this problem can be reduced to the problem addressed 
by Theorem 4.3. We simply introduce k additional variables z;, one for each 
parameter A;, together with new equations z; = 0 and new initial conditions 
Zj (to) = Aj. 
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4. Integral Curves 


If U is an open subset of R” , then a vector field on U may be defined as a function 
X : U — R". The vector field is smooth if X is a smooth function. In classical 
notation, X is written X = 4 Gj(X1,-. +5 Xn) i, and the function carries 
(X1,.--,Xy) to (a; (X1,...,Xn),---,An(X1,.--, Xe). The traditional geometric 
interpretation of X is to attach to each point p of U the vector X (p) as an arrow 
based at p. This interpretation is appropriate, for example, if X represents the 
velocity vector at each point in space of a time-independent fluid flow. 

We have defined the term “path” in a metric space to mean a continuous 
function from a closed bounded interval of R! into the metric space. The term 
curve in a metric space is used to refer to a continuous function from an open 
interval of R! into the metric space. 

A standard problem in connection with vector fields on an open subset U of 
R? is to try to draw curves within U with the property that the tangent vector 
to the curve at any point matches the arrow for the vector field. An illustration 
occurs in Figure 4.2. This section abstracts and generalizes this kind of curve. 


LEE 
a 


FIGURE 4.2. Integral curve of a vector field. 


Let X : U — R" bea smooth vector field on U. A curve c(f) is an integral 
curve for X if c is smooth and c’(t) = X(c(¢t)) for all ¢ in the domain of 
c. Depending on one’s interpretation of the informal wording in the previous 
paragraph, the present definition is perhaps more demanding than the definition 
given for R* above: the expression c’(t) involves both magnitude and direction, 
and the present definition insists that both ingredients match with X (c(¢)), not 
just the direction. 


Proposition 4.4. Let X : U — R"” be a smooth vector field on an open subset 
U of R”, and let p be in U. Then there exist an ¢ > 0 and an integral curve 
c: (—é,€) — U such that c(0) = p. Any two integral curves c and d for X 
having c(0) = d(O) = p coincide on the intersection of their domains. 
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PRooF. Apart from the smoothness the first conclusion is just a restatement 
of a special case of Theorem 4.1 in different notation. The conditions on c are 
that c be a solution of c’ = X(c) and that c(0) = p. The existence of a solution 
is immediate from Theorem 4.1 if we put F = X,c = y, to = 0, and yo = p. 
The way in which this application of Theorem 4.1 is a special case and not the 
general case is that F is independent of t here. The smoothness of c follows from 
Theorem 4.3, and the uniqueness follows from Theorem 4.2. 


The interest is not only in Proposition 4.4 in isolation but also in what happens 
to the integral curves when X is part of a family of vector fields. 


Proposition 4.5. Let X,..., X be smooth vector fields on an open subset 
U of R”, let p be in U, and let V be a bounded open neighborhood of 0 in R”. For 
Ain V,put X, = eh 4,;X. Then there exist ane > 0 anda system of integral 
curves c(t, 4), defined for t € (—eé, €) and A € V, such that c(-, A) is an integral 
curve for X, with c(0,A) = p. Each curve c(f, 4) is unique, and the function 
c:(-8,e) x V > U is smooth. If m = n, if the vectors X(p),..., X™(p) 
are linearly independent, and if 5 is any positive number less than ¢, then the 
Jacobian matrix of A + c(6, A) at A = 0 is nonsingular. 


REMARK. In the final conclusion of this proposition, the open neighborhood 
of 0 within V is allowed to depend on 6. It follows from the final conclusion that 
the Inverse Function Theorem (Theorem 3.17) and its corollary (Corollary 3.21) 
are applicable to the mapping 4 +» c(6,4) at A = 0. These results produce a 
smooth inverse function carrying an open subneighborhood of 0 within V onto 
an open subneighborhood of p of U. In effect the inverse function assigns locally 
defined coordinates in A space to a neighborhood of U. 


PROOF. We set up the system of equations c’ = X, 0c, ie., 


with initial condition c(0) = p. This is a smooth system of the kind considered 
in Theorem 4.3, and the A; with 1 < j < m are parameters. The parameters 
are handled by the concluding remark in Section 3: we obtain unique solutions 
c(t, 4) for ¢ in some open interval (—e, €), and (f, A) t c(t, A) is smooth. 

Now suppose that m = n, that the vectors X)(p),..., X(p) are linearly 
independent, and that 0 < 5 < ¢. The function c satisfies 


c(t.) = SY AjyXM (Ct, ), (x) 


n 


j=1 
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and we use this information to compute the Jacobian matrix of A tb c(6, A) at 
A. = 0. The Fundamental Theorem of Calculus, Proposition 3.28, and («) give 


““ “ dc} 
rr A) = ae i) [# ~(f, i) dt 
Xj 0 


dc; a 3 

= 0, Aa) + — ‘(t, A) dt 
a ee 
a 


6 
= i +f hie de Das [ X (c(t, A) dt. 


Now c; (0, A) = p; forall A,and hence m0, d) hes = 0. Also, c(t, 0) is constant 
in t by (*), and the constant is c(0,0) = p. Finally when A is set equal to 0 in 
the term Bar Kk Te Wh xa, i)) dt, each A, becomes 0, and thus the whole 
term becomes 0. Thus the above equation specializes at 1 = 0 to 


ae 


me | =0+8X(p) +0. 


The vectors X/)(p) are by einen linearly independent, and hence the de- 
terminant of the matrix [X; () (p)] is not O. Consequently the Jacobian matrix 
A +> c(6, A) at A = 0 is nonsingular if 6 ~ 0. 


5. Linear Equations and Systems, Wronskian 


Recall from Section | that a linear ordinary differential equation is defined to 
be an equation of the type 


An(t)y™ + an (thy?-? +++» +ai(y’ + ao(t)y = q(t) 


with real or complex coefficients. The equation is homogeneous if g is the 0 
function, inhomogeneous in general. In order for the existence and uniqueness 
theorems of Section 1 to apply, we need to be able to solve for y” and have all 
coefficients be continuous afterward. Thus we assume that a,,(t) = 1 and that 
An—1(t),..., @o(t) and q(t) are continuous on some open interval. 

Even in simple cases, the theory is helped by converting a single equation to a 
system of first-order equations. In Section 1 we saw an indication that a way to 
make this conversion is to put 


202 IV. Theory of Ordinary Differential Equations and Systems 


y1 yi y2 
y= : > = 3 
and get 
rea med 4 =o 
— ,(n—1) _——_ f 
Yn =y Yn = —40(t)y1 — +++ — An-1yn + g(t). 


If we change the meaning of the symbol y from a scalar-valued function to the 
vector-valued function y = (y;,..., ¥,), then we arrive at the system 


y’ = A(t)y + Q(t), 


where A(t) is the n-by-n matrix of continuous functions given by 


0 1 0 see 0 
0 0 1 see 0 
A(t) = ; : : 
0 0 0 see 1 
—ao(t). —ai(t). —az(t) ++ -a,-1@) 


and Q(t) is the n-component column vector of continuous functions given by 


Qt)= 


q(t) 


In a general linear first-order system of the kind we shall study, A(t) can be 
any n-by-n matrix of continuous functions and Q(t) can be any column vector 
of continuous functions; thus the first-order system obtained by conversion of a 
single n"-order equation is of quite a special form among all first-order linear 
systems. 

For a system y’ = A(t)y + Q(t) as above, the Lipschitz condition for the 
function F(t, y) = A(t)y + Q(f) is automatic, since 


IF(¢t,y)- Fy") =IAOGO — y*)| s TAMIL — y"I 


and since the function ¢t +> || A(¢)|| is bounded on any compact subinterval of our 
domain interval. By the uniqueness theorem (Theorem 4.2), a unique solution 


5. Linear Equations and Systems, Wronskian 203 


to the system is determined by data (fo, yo), the local solution corresponding to 
(to, yo) being the one satisfying the initial condition that the vector y (to) equal the 
vector yo. If we track down what these data correspond to in the case of a single 
n'*_order equation, we see that a unique solution to a single n"-order equation 
of the kind described above is determined by initial values at a point fo for the 
scalar-valued solution and all its derivatives through order n — 1. 

First-order linear systems of size one can be solved explicitly in terms of known 
functions and integrations. Specifically the single homogeneous first-order equa- 
tion y’ = a(t)y is solved by y(t) = cexp(f" a(s)ds), and the solution of a 
single inhomogeneous first-order equation can be reduced to the homogeneous 
case by the variation-of-parameters formula that appears later in this section. 
However, there need not be such an elementary solution of a first-order linear 
system of size two, not even a system that comes from a single second-order 
equation. Elementary solutions exist when the coefficient matrix has constants 
as entries, and we shall address that case in the next two sections. Sometimes 
one can write down tidy power-series solutions when the coefficient matrix has 
nonconstant entries, and we shall take up that matter later in the chapter. For 
now, we develop some general theory about first-order linear systems, beginning 
with the homogeneous case. The linearity implies that the set of solutions to 
the system y’ = A(t)y on an open interval is a vector space (of vector-valued 
functions) in the sense that it is closed under addition and scalar multiplication. 


Theorem 4.6. Let y’ = A(t)y be a homogeneous linear first-order n-by-n 
system with A(t) continuous fora < t < b. Then 


(a) any solution on a subinterval (a’, b’) extends to a solution on the whole 
interval (a, b), 

(b) the dimension of the vector space of solutions on any subinterval (a’, b’) 
is exactly n, 

(c) if v(t), ..., v,(t) are solutions on an interval (a’, b’) and if fp is in that 
interval, then v;,..., v, are linearly independent functions if and only if 
the column vectors v; (fo), ..., Un (to) are linearly independent. 


PROOF. We begin by proving (c). If cyv,(t) +--- + c,u,(t) is identically 0 
for constants cj,..., c, not all 0, then c) vj (f) +---+c,v,(t) = 0 for the same 
constants. Conversely suppose that c,vj (fo) + --- + c;v;(to) = O for constants 
not all 0. Put v(t) = cyv, (ft) +--- + c,v,(t). Then v(t) and the 0 function are 
solutions of the system satisfying the same initial conditions—that they are 0 at 
fo. By the uniqueness theorem (Theorem 4.2), v(t) is the 0 function. This proves 
(c). 

The upper bound in (b) is immediate from (c) since the dimension of the space 
of n-component column vectors is n. 
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Let us prove that n is a lower bound for the dimension in (b) if the interval 
containing fg is sufficiently small. By the existence theorem (Theorem 4.1), there 
exists a solution v;(t) on some interval |f — fo| < ¢; such that v; (to) = e;. The 
v;(t) are then solutions on |f — fo| < €¢ with ¢ = min{e),..., €,}, and they are 
linearly independent by (c). Hence the dimension of the space of solutions is at 
least n on the interval |t — to] < € or on any subinterval containing fo. 

We are not completely done with proving (b), but let us now prove (a). Let u(t) 
be a solution on (a’, b’). If we have a collection of solutions on different intervals 
containing (a’, b’) and each pair of solutions is consistent on their common 
domain, then the union of the solutions is a solution. Consequently we may 
assume that u(t) does not extend to a solution on any larger interval. We are 
to prove that (a’, b’!) = (a,b). Suppose on the contrary that b’ < b. We use 
to = D’ in the previous paragraph of the proof; the result is that on some interval 
|t —b’| < e with e sufficiently small and at least small enough so thata’ < b’—e, 


the space of solutions has dimension n with a basis {v1,..., Un}. By (c), the 
column vectors v;(b' — €),..., U,(b’ — €) are linearly independent, and thus the 
restrictions of v;,..., v, to (b’ — «, b’) are linearly independent. The restriction 


of v(t) to the interval (b’ — e, b’) is a solution, and thus there exist constants 
Cj,.--, Cy Such that 


v(t) = cy vy (t) +--+ + c,v,(t) forb’ —e <t <b’. 


But then the function equal to v(t) on (a’, b') and equal to cy v1 (t) +--+ +CnUn(t) 
on (b’ — ¢, b' + €) extends v(t) to a solution on a larger interval and contradicts 
the maximality of the domain of u(t). This proves that b’ = b. Similarly we find 
that a’ = a. This proves (a). 

We return to the unproved part of (b). Fix fo in (a’, b’). On a subinterval 
about fg, the space of solutions has dimension n, as we have already proved. Let 
{v1,..., Un} be a basis. By (a), we can extend v,,..., v, to solutions on (a’, b’). 
Then the space of solutions on (a’, b’) has dimension at least n, and (b) is now 
completely proved. 


EXAMPLE. Let us illustrate the content of Theorem 4.6 by means of a single 
second-order equation, namely y” + y = 0. We know that cj cost + cz sint isa 
solution for every pair of constants c; and cy. To convert the equation to a system, 
we introduce y; = y and y2 = y’. The system is then 


yi = 92, 
y= =", 
01 


and hence the matrix is A(t) = ( ) ,a matrix of constants. The scalar-valued 


-10 
solutions cost and sint of y” + y = 0 correspond to the vector-valued solutions 
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( i and ( say ) , respectively; each of these has a scalar-valued solution in 
—sint cost 


its first entry and the derivative in the second entry. In either case, both solutions 
are defined on the interval (—oo, +00). The theorem says that the restrictions 
of these two functions to any subinterval span the solutions on that subinterval. 
According to (c), the linear independence of the scalar-valued solutions cos ¢ and 


sint is reflected by the linear independence of the column vectors ( ae) and 


( es ) for any fo in (—oo, +00). The latter independence we can see immediately 


COS fo Sin fo 


by observing that the matrix ( ) has determinant equal to 1 and not 0. 


—sinty COs fo 
The kind of matrix formed in the previous example is a useful tool when 

generalized to an arbitrary homogeneous linear system, and it has a customary 

name. Let v;(t), ..., Up (t) be solutions of an n-by-n homogeneous linear system 

y’ = A(t)y with A(t) continuous. The Wronskian matrix of v),..., vu, is the 

n-by-n matrix whose j column is v;. If v;,; denotes the i entry of the j™ 

solution, then 

Vy) st Un) 

Wi) = : 


Unit) +++ Unn(t) 
Since each column of W(r) is a solution, we obtain the matrix identity W’(t) = 
A(t)W(f). 


EXAMPLE, CONTINUED. In the case of the single second-order equation 
y” + y = 0, we listed two linearly independent scalar-valued solutions as cos t 
and sint. When the equation is converted into a 2-by-2 homogeneous linear 
system, the Wronskian matrix is 


cost sint 
W(t) = ( ‘ ) ; 
—sint cost 
For a general n'*_order equation with v1, ..., UV, as scalar-valued solutions, the 
Wronskian matrix of the associated system is 
wi) +s Un (t) 
vit) su) 
Wt) = : : 
wy Pa a PD 
Proposition 4.7. If v,(t), ..., v,(t) are solutions on an interval of an n-by-n 


homogeneous linear system y’ = A(t)y with A(t) continuous, then the following 
are equivalent: 


(a) V1,..., Un are linearly independent solutions, 
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(b) det W(t) is nowhere 0, 
(c) det W(t) is somewhere nonzero. 


PROOF. By Theorem 4.6c, (a) here is equivalent to the linear independence 
of v1 (to), .--, Un(to), nO matter what t9 we choose, hence is equivalent to the 
condition det W (to) ~ 0, no matter what f9 we choose. The proposition follows. 


We shall use the Wronskian matrix of a homogeneous system to analyze the 
solutions of any corresponding inhomogeneous system. 


Proposition 4.8. For an inhomogeneous linear system y’ = A(x)y + Q(t) 
with A(t) and Q(t) continuous fora < t < b, any solution y*(¢) on a subinterval 
(a’, b’) of (a, b) extends to be a solution on (a, b), and the most general solution 
y(t) is of the form y(t) = h(t) + y*(t), where y*(t) is one solution of y’ = 
A(t)y + Q(t) and h(t) is an arbitrary solution of the homogeneous system y’ = 
A(t)y. 


PROOF. If y* and y** are two solutions of y’ = A(t)y + Q(t) on (a’, b’), then 
(y*—y*)') = AMy*O+OM)-AMy*O+ QM) = ADO*-y*)O), 
and h = y** — y* solves y’ = A(t)y on (a’, b’). Conversely if h solves y’ = 
A(t)y + Q(t) on (a’, b’), then 


O* +hYO =" Oho 
= (A(t)y*(t) + O(t)) + AMA) = AM O* + A)(t) + QO), 


and y* + his a solution of y’ = A(t)y + Q(t) on (a’, b’). 

We are left with showing that any solution y* of y’ = A(t)y+ Q(t) on (a’, D’) 
extends to a solution on (a, b). As in the proof of Theorem 4.6a, we can form 
unions of functions and thereby assume that y* cannot be extended to be a solution 
on a larger interval. The claim is that (a’, b’) = (a, b). Assuming the contrary, 
suppose, for example, that b’ < b. By the existence theorem (Theorem 4.1), there 
exists a solution y**(t) of y’ = A(t)y + Q(t) for |t —b’| < e if € is small enough. 
By the result of the previous paragraph, y*(t) = y**(t) +h(t) on (b/ —«, b’) fora 
suitable choice of h that solves the homogeneous system y’ = A(t)yon(b’—«, b’). 
Since y**(f) is given asa solution of y’ = A(t)y+Q(t) on (b’—«, b’+e) andsince, 
by Theorem 4.6a, /(f) extends to a solution of y’ = A(t)y on (b’ —«, b' +e), we 
see that y**(t)-+A(t) extends toa solution of y’ = A(t)y+ Q(t) on (b’—«, b’+¢e). 
Then the function equal to y*(t) on (a’, b’) and to y**(t) +h(ft) on (b’—«, b’ +6) 
extends y*(t) to a solution of y’ = A(t)y + Q(t) ona larger interval, namely 
(a’, b’ + €). We obtain a contradiction and conclude that b’ must have equaled 
b. Similarly a’ must equal a. Thus every solution of y’ = A(t)y + Q(t) ona 
subinterval extends to all of (a, b), and the proof is complete. 
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Theorem 4.9 (variation of parameters). For an inhomogeneous linear system 
y’ = A(x)y + Q(t) with A(t) and Q(t) continuous fora < t < b, let vj, ..., Uy, 
be linearly independent solutions of y’ = A(t)y on (a, b), and let W(t) be their 
Wronskian matrix. Then a particular solution y* of y’ = A(t)y + Q(t) on (a, b) 
is given by 


y(t) = Wun), where W(t)u'(t) = Q(t). 


That is, 
t 
y*(t) = wo | W(s)'Q(s)ds. 
REMARKS. Linearly independent solutions v1, ..., U, as in the statement exist 
by Theorem 4.6. 


PROOF. For any differentiable vector-valued function u(t), y*(t) = W(t)u(t) 
has 
(y*)’ = Wu + Wu’ = AWu+ Wu’ = Ay* + Wu’. 
Thus y* will have (y*)’ = Ay* + Q if and only if Wu’ = Q. Since Proposition 
4.7 shows that W(t)~! exists and is continuous, we can solve Wu! = Q for u. 


EXAMPLE, CONTINUED. Now consider the single second-order inhomogeneous 
linear equation y” + y = tant on the interval |r| < 2/2. We saw that we can 


take W(t) = ( oe ). We set up the system 


—sint cost 


cost sint uy \ _ 0 
—sint cost u,)  \ tant 


of algebraic linear equations and solve for u', and w’,: 


: sin’ t 
us cost —sint 0 - 
1)a[( 7% = cost 
us sin t cost tant 
sint 


A vector-valued function with derivative (? ) for |t| < 2/2 is 
2 


eS = ie log(1 + sint) 4 ee) 


ur(t) —cost 


and we thus take y*(t) = (cost)u,(t) + (sint)u2(t). The most general solution 
of the given inhomogeneous equation is therefore y*(t) + cy cost + c2 sint. 
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6. Homogeneous Equations with Constant Coefficients 


In this section and the next, we discuss first-order homogeneous linear systems 
with constant coefficients. The system is of the form y’ = Ay with A a matrix 
of constants. A single homogeneous n"-order linear equation with constant 
coefficients can be converted into such a first-order system and can therefore be 
handled by the method applicable to all first-order homogeneous linear systems 
with constant coefficients. But such an equation can be handled more simply in a 
direct fashion, and we therefore isolate in this section the case of a single n"*-order 
equation. This section and the next will make use of material on polynomials 
from Section A8 of the appendix. 
The equation to be studied in this section is of the form 


y™ + any") +++ tary! +aoy =0 


with coefficients in C. Let us write this equation as L(y) = 0 fora suitable linear 
operator L defined on functions y of class C”: 


t= (2) ba(Z 4-40 t) +0 


The term do is understood to act as dap times the identity operator. Since g = 
re’, we immediately obtain 


L(e") = (r" +ay_yr"” 


M4... tar tage”. 


The polynomial 
PONS" Fan et a ey 

is called the characteristic polynomial of the equation, and the formula L(e"') = 
P(r)e™ shows that y(t) = e” is a solution of L(y) = 0 if and only if r is a root 
of the characteristic polynomial. From Section A8 of the appendix, we know that 
the polynomial P(A) factors into the product of linear factors 2 — r, the factors 
being unique apart from their order. Let us list the distinct roots, i.e., the distinct 
such complex numbers r, as r;,..., 7% with k <n, and let us write m,; for the 
number of times that A — r; occurs as a factor of P(A), i.e., the multiplicity of r; 
as aroot of P. Then we have an m; =n and 


PQ) = Ts Q—rym. 


Corresponding to this factorization of P is a factorization of L as 


tT =)” 


On the right side the individual factors commute with each other because differen- 
tiation commutes with itself and with multiplication by constants. The following 
lemma therefore produces n solutions of the given equation L(y) = 0. 
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Lemma 4.10. For m > 1 andr in C, all the functions e”’, te, ..., 1’”~!e”" 
are solutions of the m'-order differential equation 
d \ 
~_r) ()=0. 
(F . 
PROOF. Direct computation gives (4 — r)(tke) = kt''e, and hence 


(4 —r)"(tke™) = k(k —1)--- (kK — m4 1)tk™e". The right side is 0 if 
0 <k <m-— 1, and the lemma follows. 


Lemma 4.11. Let r,,..., ry be distinct complex numbers, and let m; be N 
integers > 1. Then the eS mj; functions 


CP AT. CG es, 1<j<QN, 


are linearly independent over C. 


PROOF. Let k > 1 be an integer, let r be a complex number, and let P(t) be a 
polynomial of degree < k — 1. We allow P(t) to be the 0 polynomial. Then 


i(k + Pe] =r(k + Pte + ((k — Dt 1 + PO)”, 
from which it follows that 
Zt + P(t))e] = (rt + O))e" (x) 


with Q(t) a polynomial of degree < k — 1 or the 0 polynomial. 

We shall prove by induction on WN that if P,,..., Py are polynomials with 
complex coefficients such that ear P;(t)e’" is the 0 function, then all the P; are 
0 polynomials. For N = 1,if P (te is the O function, then P(t) is the O function. 
Since a polynomial of degree k > 0 has at most k roots, we conclude that P has 
all coefficients 0. This disposes of the assertion for N = 1. Assume the result 
for N — 1, and suppose that we are given that yo P,(t)e"i’ + Py (te is the 
0 function, where {r),...,7v—1, rv} are distinct. Then 


N-1 
PGgjet" + Py@) (4) 


is the 0 function when g; = rj —ry for j < N — 1. If Py is the 0 polynomial, 
the inductive hypothesis shows that all P; with j < N — 1 are 0 polynomials. 
Otherwise let Py have degree d, and differentiate (**«) d + 1 times. If P;(t) for 
j < N — Lis the sum of a,,t”’ plus lower-degree terms, then (*) shows that the 
result of the differentiation is that 
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N-1 


(an, (qj)"! 


t”) + lower-degree terms)e4" 


7 
IL 


is the 0 function. By the inductive hypothesis each a, has to be 0, and hence all 
coefficients of each P; have to be 0 for j < N — 1. Then Py(f) is identically 0 
and must be the 0 polynomial. This completes the induction. 

If we are given a linear combination of the functions in the statement of 
the lemma that equals the 0 function, then we obtain a relation of the form 
yy P;(t)e"i' = 0, and we have just seen that this relation forces all P; to be 0 
polynomials. This completes the proof. 


Proposition 4.12. Let the differential equation 
y™ + apy") +++ +ary’ +aoy =0, 


with complex coefficients, have characteristic polynomial given by P(A) = 
les (A—rj)" withr,, ..., 7% distinct complex numbers and with the m; integers 
> 0 such that ye , mj; =n. Then the n functions 


et, tel", aaa) pte, 1 = J = k, 


form a basis over C of the space of solutions of the given equation on any interval. 


PROOF. Lemma 4.10 shows that the functions in question are solutions, Lemma 
4.11 shows that they are linearly independent, and Theorem 4.6 shows that 
the dimension of the space of solutions on any interval is n. Since n linearly 
independent solutions have been exhibited, they must form a basis of the space 
of solutions. 


If the equation in Proposition 4.12 happens to have real coefficients, it is 
meaningful to ask for a basis over R of the space of real-valued solutions. Since the 
coefficients are real, we have L(y) = L(y) for all complex-valued functions y of 
class C”, and it follows that the complex conjugate of any complex-valued solution 
is again a solution. Thus the real and imaginary parts of any complex-valued 
solution are real-valued solutions. Meanwhile, the characteristic polynomial P 
of the equation has real coefficients, and it follows that the set of roots of P is 
closed under complex conjugation. In addition, the multiplicity of a root equals 
the multiplicity of its complex conjugate. For any integer k > 0 and complex 
number a + bi with b ~ 0, we have 


Crke@thot 4 Crke—)! = Crke cos bt + Ct*e™ sin bt. 


Thus re cos bt and t*e” sin bt form a basis over C of the space spanned by 
tke(@+h0t and tke" The functions te” cos bt and t*e” sin bt are real-valued, 
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and thus we obtain a basis over C consisting of the real-valued solutions of the 
given equation if we retain the solutions r*e”’ with r real and we replace any 
pair tke@*")" and tke)" of solutions, b # 0, by the pair rXe“’ cos bt and 
tke” sin bt. 

Let us see that these resulting functions form a basis over R of the real vector 
space of real-valued solutions. In fact, we know that they are linearly independent 
over IR because they are linearly independent over C. To see that they span, we 
take any real-valued solution and expand it as a complex linear combination 
of these functions. The imaginary part of this expansion exhibits 0 as a linear 
combination of the given functions, and the coefficients must be 0 by linear 
independence. Thus the constructed functions form a basis over R of the space 
of real-valued solutions. 


7. Homogeneous Systems with Constant Coefficients 


Having discussed linear homogeneous equations with constant coefficients, let 
us pass to the more general case of first-order homogeneous linear systems with 
constant coefficients. We write the system as y’ = Ay with A an n-by-n matrix of 
constants. In principle we can solve the system immediately. Namely, Proposition 
3.13c tells us that £ (e'4) = Ae’, so that each of the n columns of e’4 is a solution 
of y’ = Ay. Att = 0,e!4 reduces to the identity matrix, and thus these n solutions 
are linearly independent at t = 0. By Theorem 4.6 these n solutions form a basis 
of all solutions on any subinterval (a, b) of (—oo, +00). The solution satisfying 
the initial condition y(t) = yo is y(t) = e'4e~4yo, which is the particular 
linear combination )°"_, cje'“e; of the columns of e'“ in which c; is the number 
cj = (€4 yo)j. 

In practice it is not so obvious how to compute e!4 except in special cases in 
which the exponential series can be summed entry by entry. Let us write down 
three model cases of this kind, and ultimately we shall see that we can handle 
general A by working suitably with these cases. 


MODEL CASES. 
(1) Let 
0 1 0 =O 0 O 
0 1 =O 0 O 
0 1 0 O 
C= : 
0 1 0 
0 1 


S 
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be of size m-by-m with 0’s below the main diagonal. Raising C to powers, we 
see that the (i, j ya entry of Af is 1 if j =i+tk and is 0 otherwise. Hence 


Ot at? ae + mat mat 
12 1 3 1 ~2 
0 J ae wait eo 
: 1 3 
() t ee, . mae 
, ‘ 
0 t at? 
0 t 
0 
with 0’s below the main diagonal. 
(2) Let 
a 1 0 0 0 0 
a 1 0O 0 0 
a 1 0 0 
A= ; aE 
a 1 0 
a l 
a 


so that A = al + C with C as in the previous case. Since al and C commute, 
Proposition 3.13a shows that e’4 = e“e'©. In other words, e’4 is obtained by 
multiplying every entry of the matrix e’© in the previous case by e“’. A matrix 
of this form A for some complex constant a and for some size m is said to be a 
Jordan block. Thus we know how to form e’4 if A is a Jordan block. 


(3) Let A be block diagonal with each block being a Jordan block: 


block #1 
block #2 


block #k 


Then e block #1 


et block #2 
et block #k 


Thus we know how to form e!4 if A is block diagonal with each block being a 
Jordan block. A matrix A of this kind is said to be in Jordan form. 
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The theorem reduces any computation of a matrix e’ to this case. 


Theorem 4.13 (Jordan normal form). For any square matrix A with complex 
entries, there exists a nonsingular complex matrix B such that B~'AB = J is in 
Jordan form. 


REMARKS. This theorem comes from linear algebra, but knowledge of it is 
beyond the algebra prerequisites for this book. The proof is long and is not in the 
spirit of this text, and we shall omit it; however, the interested reader can find a 
proof in many linear algebra books. As a practical matter, the proof will not give 
us any additional information, since we already know that e yields the solutions 
to y’ = Ay and the only remaining question is to convert the statement of the 
theorem into an explicit method of computation. 


Let us see what Theorem 4.13 accomplishes. The solution of y’ = Ay with 
y(to) = yo is y(t) = e*— Ayo, Write B~'AB = J as in the proposition. Then 
Proposition 3.13d gives 


y(t) = e€ 4 yy = B(B'e*)4B)B yo 


= Bet) BAB p-ly, — Be*—)I Bo! yo, 


If we can compute J, then Model Case 3 above tells us what e—)” is. If we can 
compute B also, then we recover y(t) explicitly. 

The practical effect is that Theorem 4.13 gives us a method for calculating 
solutions. The idea behind the method is that the qualitative properties of B and 
J forced by the theorem are enough to lead us to explicit values of B and J. Let 
us go through the steps. A concrete example of J is 


ooe,k 
Conk 
>) 


It is helpful to know the extent of uniqueness in Theorem 4.13. The matrix J is 
actually unique up to permuting the order of the Jordan blocks. The matrix B is 
not at all unique but results from finding bases of certain subspaces of C”. The 
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first step is to form the characteristic polynomial? P(A) = det(Al — A) of A. 
We have 


det(Al — J) = det(Al — B~'AB) = det(B~!(Al — A)B) 
= det(B)~! det(Al — A) det(B) = det(Al — A), 


and thus J has the same characteristic polynomial as A. The characteristic 
polynomial of J is just the product of expressions 4 — d as d runs through the 
diagonal entries of J. According to Section A8 of the appendix, the factorization 
of a polynomial with complex coefficients and with leading coefficient 1 into 
first-degree expressions 4 — c is unique up to order, and thus the factorization of 
P(A) tells us the diagonal entries of J. We still need to know the sizes of the 
individual Jordan blocks. 

The sizes of the Jordan blocks come from computing dimensions of various 
null spaces—or kernels, in the terminology of linear functions. If a occurs as a 
diagonal entry of J, think of forming J — al and its powers, and consider the 
dimension of the kernel of each power. For example, with the explicit matrix J 
that is written above, we have 

0 1 0 
0 0 1 
0 0 0 


1 
J-al= 0 ’ 


nonsingular 


and dimker(J — a1) is the number of Jordan blocks of size > 1 with a on the 
diagonal, namely 4 in this case. Next we consider (J — a1)’. In this case, 


0 0 


1 
0 0 0 
0 0 0 


(J-—alyP = 


nonsingular 


Many books write the characteristic polynomial as det(A — 4.1), which is the same as the present 
polynomial if n is even but is its negative if n is odd. The present notation has the advantage that 
the notions of characteristic polynomial here and in the previous section coincide when an n"*-order 
equation is converted into a first-order system. 
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and dim ker(J — a1)? = 7. This number arises as the sum of the previous number 
and the number of Jordan blocks of size > 2 with a on the diagonal. Thus 
dim ker(J — a1)? — dimker(J — 1) in general is the number of Jordan blocks of 
size > 2 with a on the diagonal. Finally we consider (J — a1)°. In this case, the 
upper left part of (J —a1)* corresponding to diagonal entry a is all 0, and the lower 
right part is nonsingular; hence dim ker(J — a1)? = 8. This number arises as the 
sum of the previous number and the number of Jordan blocks of size > 3 with a 
on the diagonal. Thus in general, dimker(J — al)? — dimker(J — a1)? is the 
number of Jordan blocks of size > 3 with a on the diagonal. In our example, the 
number dim ker(J — a1)* remains at 8 for all k > 3 because 8 is the multiplicity 
of a as a root of P(A), and we are therefore done with diagonal entry a; our 
computation has shown that the numbers of Jordan blocks of sizes 1, 2,3,4,..., 
are 1,2, 1,0,..., and acheck on the computation is that 1(1) + 2(2) +31) = 8. 

Of course, we do not have J at our disposal for these calculations, but A yields 
the same numbers. In fact, we have B(J — al)*B-! = (A — al)‘, from which 
we see that x € ker(A — a1)* if and only if B-'x € ker(J — al)k. Hence 


B(ker(J — al1)*) = ker(A — al)‘. 
Since B is nonsingular, the dimension of the kernel of (J — a1)* equals the 
dimension of the kernel of (A — a1)*. Consequently 


dim ker(A — a1) = #{Jordan blocks of size > 1 with a on diagonal}, 


dim ker(A — a1)* — dimker(A — a1) 
= #{Jordan blocks of size > 2 with a on diagonal}, 


dim ker(A — a1)? — dimker(A — a1)” 
= #{Jordan blocks of size > 3 with a on diagonal}, 


etc. 


Repeating this argument with the other roots of P(A), we find that we can 
determine J completely. 

Calculating B requires working with vectors rather than dimensions. The 
columns of B are just Be,,..., Be,, and we seek a way of finding these. Fix 
attention on a root a of P(A). Consider an index i with 1 <i <n, and suppose 
that the diagonal entry of J in column i is a. From the form of J, we see that 
either the i column of J — a1 is 0 or else it is e;_1. In the latter case, index i — 1 
corresponds to the same Jordan block. Using the identity (A—a1)B = B(J—al), 
we see that either 


(A — al1)(Be;) = BJ —alje; = 0 
or (A — al)(Be;) = BJ — alje; = Be;-1, 
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and index i — 1 corresponds to the same Jordan block as index i in the latter 
case. Thus the vectors Be; corresponding to the columns with diagonal entry a 
and with smallest index for a Jordan block lie in ker(A — a1). They are linearly 
independent since B is nonsingular, and the number of them is the number of 
Jordan blocks corresponding to diagonal entry a. We saw that this number equals 
dim ker(A — a1). Hence the vectors Be; corresponding to the smallest indices 
going with each Jordan block form a basis of ker(A — a1). 
Similarly 


(A — al)*(Be;) = B(J —al)*e; = 0 
or (A — al)*(Be;) = B(J — al)*e; = Bej_o, 


and index i — 2 corresponds to the same Jordan block as index i in the latter case. 
Thus the vectors Be; corresponding to the columns with diagonal entry a and with 
smallest or next smallest index for a Jordan block lie in ker(A — a1). They are 
linearly independent since B is nonsingular, and the number of them is the sum 
of the previously computed number, namely dim ker(A — a1), plus the number 
of Jordan blocks of size > 2 that correspond to diagonal entry a. We saw that this 
sum equals dimker(A — a1)*. Hence the vectors Be; corresponding to the two 
smallest indices going with each Jordan block form a basis of ker(A — a1)”. The 
new vectors Be; are therefore vectors that we adjoin to a basis of ker(A — a1) to 
obtain a basis of ker(A — al)”. 

In setting up these vectors properly, however, we have to correlate the indices 
studied at the previous step with those being studied now. The relevant formula is 
that the new indices i have the property (A — a1) Be; = Be;_,. To obtain vectors 
with this consistency property, we would take a basis S; of ker(A — a1), extend 
it to a basis $2 of ker(A — a1)’, discard the members of 51, apply A — al to the 
members of $2 — S;, and extend (A — a1)(S2 — S}) to a basis T; of ker(A — al). 
Then S, = (S2 — S,) UT; is a new basis of ker(A — al)?. 

We can continue the argument in this way. It is perhaps helpful to read the 
general discussion of the argument side by side with the explicit example that 
appears below. We continue to find that the construction of new basis vectors gets 
in the way of the necessary consistency property with the earlier basis vectors. 
Thus we really must start with the largest index k such that ker(A — al)k 4 
ker(A — al)*~!. We extend a basis Sy_; of ker(A — al)‘~! to a basis S, of 
ker(A — a1)‘, and form 


(Sp — Sp-1) U (A — a1)(Sy — Sp) U- ++ U (A — 1)" (Sy = Sy-1). 


These vectors will be the columns of B corresponding to the largest Jordan blocks 
with diagonal entry a. The vectors in 


(A — al)? (Sy — Sp-1) UU (A — al)! — Si-1) 
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are linearly independent in ker(A — a1)*~?; we extend this set to a basis S;_, of 
ker(A — a1)*~?, and we extend Sj» U (A — al) (Sx — Sk-1) to a basis S;,_, of 
ker(A—a1)*~!. The adjoined vectors, together with the result of applying powers 
of A — al to them, will be the columns of B corresponding to the next largest 
Jordan blocks with diagonal entry a. The process continues until we obtain a 
basis of ker(A — a1)* with the necessary consistency property throughout. Then 
we repeat the process for the other roots of P(A) and assemble the result. 


4 1 -l 
A= (- —2 2). 
8 2 =2 


The characteristic polynomial is P(A) = det(A1— A) = 13, whose factorization is 
evidently P(A) = (A— 0)?. Computing the kernel of A, we find that dimker A = 
2, so that there are 2 Jordan blocks. Also, A? = 0, so that dimker A? = 3 and 
the number of blocks of size > 2 is 3 — 2 = 1. Thus 


0 1 0 
1=(0 0 0). 
0 0 0 


EXAMPLE. Let 


x} 

We form a basis of ker A by solving A (::) = 0. The standard method of row 
X3 

reduction gives x; = — 4x + 5X3 with x2 and x3 arbitrary, so that a basis of 


a 1 

a 1 

ker A consists of ( 1 and 6 . We extend this to a basis of ker A? = C? 
0 1 


1 4 
by adjoining, for example, the vector v; = (0). Then Av; = (- ) The 
0 8 
vector Av; is in ker A, and we extend it to a basis of ker A by adjoining, for 
—1 
example, v2 = ( ‘) Then v,, Av, v2 form a basis of ker A? = C?, and the 


0 
above general method asks that these vectors be listed in the order Avy, v1, V2. 


The matrix B is obtained by lining these vectors up as columns: 


4 1 -l 
a=(-s 0 ‘). 
8 0 0 


The result is easy to check. Computation shows that B~! = 


ao = oS 
AIF BIE © 
BIE BIRO 


and then one can carry out the multiplications to verify that B-'AB = J. 
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8. Series Solutions in the Second-Order Linear Case 


In this section we shall consider, in some detail, series solutions for two kinds of 
ordinary differential equations. 
The first kind is 
y" + P(t)y’ + Q(t)y =9, 


where P(t) and Q(t) are given by convergent power-series expansions for 
|t| < R: 


P(t)=ajpt+tat+at nee 
O(t) =bo+ bit bot? 


We seek power-series solutions of the form 
y(t) = eg tet tent +--+. 


The same methods and theorem that handle this first kind of equation apply also to 
n'"_order homogeneous linear equations and to first-order homogeneous systems 
when the leading coefficient is 1 and the other coefficients are given by convergent 
power series. The second-order case, however, is by far the most important for 
applications and is sufficiently illustrative that we shall limit our attention to it. 

The idea in finding the solutions is to assume that we have a convergent power- 
series solution y(t) as above, to substitute the series into the equation, and to sort 
out the conditions that are imposed on the unknown coefficients. Our theorems 
on power series in Section I.7 guarantee us that the operations of differentiation 
and multiplication of power series maintain convergence, and thus the result of 
substituting into the equation is that we obtain an equality of a convergent power 
series with 0. Corollary 1.39 then shows that all the coefficients of this last power 
series must be 0, and we obtain recursive equations for the unknown coefficients. 
There is one theorem about the equations under study, and it tells us that the 
power series for y(t) that we obtain by these manipulations is indeed convergent; 
we state and prove this theorem shortly. 

Let us go through the steps of finding the solutions. These steps turn out to be 
clearer when done in complete generality than when done for an example. Thus 
we shall first make the computation in complete generality, then state and prove 
the theorem, and finally consider an important example. The expansions of y(t) 
and its derivatives are 


y(t) =cot+eit tot? +---, 
y(t) = c1 + 2cot + 3c3t7 +---, 


y(t) = cp +3 + 2c3t +4-3c4t? +--+. 
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Substituting all the series into the given equation yields 


Qs les £33 Det 4s Séit? +8) 


(ag + ayt nay +++ )(cy + 2eot 4 3c3t7 f +--+) 


(bo + byt + bot? +++ )(co t cit oot? +--+) =0. 


If the series for y(t) converges and if the left side is expanded out, then the 
coefficients of each power of t must be 0. Thus 


2+1e2 + aoc; + boco = 0,7 
3 + 203 + (ag2c2 + .a1¢1) + (boc1 + bico) = 0, 


4 - 304 + (ap3c3 + a12c2 + a2ce1) + (boc2 + bic; + bac) = 0, 


n(n — 1)cey + (ag(n — 1)en—1 + ay (1 — 2)ep—2 + +++ + An—2C1) 
+ (boCn—-2 + bi en-3 ++ ++ + Dn—2€0) = 0. 


These equations tell us that co and c; are arbitrary and that c2,c3,... are each 
determined by the previous coefficients. Thus c2,c3,... may be computed in- 
ductively. Since co = y(O) and c; = y’(0), this degree of flexibility is consistent 
with the existence and uniqueness theorems. 


Theorem 4.14. If P(t) and Q(t) are given by convergent power series for 
|t| < R, then any formal power series that satisfies y” + P(t)y’+ O(H)y =0 
converges for |t| < R to a solution. Consequently every solution of this equation 
on the interval —R <t < R is given by a power series convergent for |t| < R. 


PROOF. Fix r withO <r < R,and choose some R; withr < R, < R. Let 
the notation for the power series of P, Q, and y be as above. Theorem 1.37 
shows that the series with terms |a, R/| and |b, R/| are convergent, and hence the 
terms are bounded as functions of n. Thus there exists a real number C such that 
lan| < C/R} and |b,| < C/R} for alln > 0. We shall show that |c,| < M/r” 
for a suitable M and alln > 0. 

The constant M will be fixed so that a large initial number of terms have 
Icn| < M/r”, and then we shall see that all subsequent terms satisfy the same 
inequality. To find an M that works, we start from the formula computed above 
for Cy: 


n(n — 1)cy = —(ao(n — 1)en—1 + 44 (Nn — 2)en—2 + +++ + Gn—2¢1) 
— (bocn—2 + bicn-3 + +++ + bn—20€0). 
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If M works for 0, 1,...,2— 1, then 


nn = Dlg CMG= (Rr OY ERO fa ROO) 
+CM(R, 0. (n DV 4R 1. (n SN dete de RO tip Oy 


x r r \n-2 
=CMin = Vr "(14 bot ( ) ) 
Ri Ri 
temrm(lt ett (E)" ) 
rr ree 
Ri Ri 
1 


<r-"(CM)(r(n— 1) +r”) 


1 —(r/Rj) 


and therefore 


CR, rn—1)+4+r? 
Ri-r ntn-1) ) 

For n sufficiently large, the factor in parentheses is < 1. At that point we obtain 
len| < Mr~" if |cy| < Mr~* fork < n,and induction yields the asserted estimate. 
Thus )> cyt” converges for |t| <r. Since r can be arbitrarily close to R, )> Cnt” 
converges for |t| < R. 

Finally we saw above that co and c are arbitrary and can therefore be matched 
to any initial data for y(0) and y’(0). Consequently the vector space of power- 
series solutions convergent for |t| < R has dimension 2. By Theorem 4.6, all 
solutions on the interval —R < t < R are accounted for. This completes the 
proof. 


len] < Mr-"( 


As a practical matter, the recursive expression for c,, becomes increasingly 
complicated as n increases, and a closed-form expression need not be available. 
However, in certain cases, something special happens that yields a closed-form 
expression for c,. Here is an example. 


EXAMPLE. Legendre’s equation is 
(1—2?)y” — 2ty’ + p(p+ Dy =0 


with p a complex constant. To apply the theorem literally, we should first divide 
the equation by (1 — f7), and then the power-series expansions of the coefficients 
will be convergent for |t| < 1. The theorem says that we obtain two linearly 
independent power-series solutions of the equation for |t| < 1. To compute them, 
it is More convenient to work with the equation without making the preliminary 
division. Then the equation gives us 


(265 +3 2 2ear 4 3c? Bo) = Oat? + 3% Dest? + 46 3g +--+) 


Q(eqt + 2cgt? + 3307 + 4cqt* +--+) + p(pt D(co tert toot? +-+-) =0, 
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which yields the following formulas for the coefficients: 


2c. + p(p + 1)co = 9, 
3-2c3 — 2c; + p(p + 1)c1 = O, 
4-3c4—2- ley — 2+ 2c0 4 P(p t+ 1)co = 0,7 


n(n — 1)en — [H — 2)(n — 3) + 2(n — 2) — p(p + I) Jen-2 = 0. 


Thus we can write c, explicitly as a product. We can verify convergence of 
>> cnt” directly by the ratio test: since 


Cat” (n ~2(n—3) +20 —2)— p(pt+)) 2 


Cn—2t"—2 n(n — 1) 


> 


we have convergence for |t| < 1. Observe that the numerator in the fraction on 
the right is equal to 


(n — 2)(n — 3) + 2(n — 2) — p(p+ ID =(—-2)m—-1)—- p(pt+)), 


and this is 0 when p is an integer > 0 and n — 2 = p. Therefore one of the 
solutions is a polynomial of degree p if p is an integer > 0. Such polynomials, 
when suitably normalized, are called Legendre polynomials. 


The second kind of ordinary differential equation for which we shall seek series 
solutions is 
ty’ +tP(t)y' + OWy =0, 


where P(t) and Q(t) are given by convergent power-series expansions for 
|t| < R: 


P(t)=ajpt+ait+at ae 
O(t) = bo + bit + bot? 


The existence and uniqueness theorems do not apply to this equation on an interval 
containing t = 0 unless ¢ happens to divide P(t) and ¢? happens to divide 
Q(t). When this divisibility does not occur, the above equation is said to have a 
regular singular point at t = 0. The treatment of the corresponding n'-order 
equation is no different, but we stick to the second-order case because of its 
relative importance in applications. For this kind of equation, the treatment of 
first-order systems is more complicated than the treatment of a single equation of 
n™ order. 
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Actually, the second-order equation above need not have power series solu- 
tions. The prototype for the above equation is the equation 


ry" +tPy' + Oy =0 


with P and Q constant. This equation is known as Euler’s equation and can be 
solved in terms of elementary functions. In fact, we make a change of variables 
by putting ¢ = e* and x = logt fort > 0. Then we obtain 


dy _dydx _ l|dy 


dt dxdt tdx 


and 
dy _ eee ldy ld’ydx __ ldy , 1d’y 
dt2 t2dx | tdt\dx/ = t2dx ' tdx2dt = t2dx ° t2 dx?’ 


and hence the equation becomes 


L(P-1) 2 + Qy=0 
+ ( Fe 


This is an equation of the kind considered in Section 6. A solution is e*’, where s 
is a root of the characteristic polynomial s* + (P — 1)s + Q = 0. If the two roots 
of the characteristic polynomial are distinct, we obtain two linearly independent 
solutions for x € (—oo, +00), and these transform back to two solutions t* of 
the Euler equation for t > 0. If the characteristic equation has one root s of 
multiplicity 2, then we obtain the two linearly independent solutions e** and xe** 
for x € (—oo, +00), and these transform back to two solutions x* and x* log x 
for x > 0. 

In practice, the technique to solve the Euler equation t?y"” + tPy’ + Qy = 0 
is to substitute y(t) = ¢* and obtain s(s — 1)t* +s Pt’ + Qt* = 0. This equation 
holds if and only if s satisfies 


s(s—I)+s5P4+Q0=0, 


which is called the indicial equation. 
In the general case of a regular singular point, we proceed by analogy and are 
led to seek for ¢ > 0 a series solution of the form 


y(t) =H (cot cit tet? +--+) — with co £0. 


Suppose that the power-series part }> c,t” is convergent. We substitute and obtain 


P(cos(s —1) + ei(s + Dst tools +2)(s + 1)t? +--+) 
is seg + + Dee (e 4 2)es? $s) 


t* (bo byt bot? -+)(co + cit +t? +++.) =0. 


ee (dp + ayt + ant 
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Dividing by ¢* and setting the coefficient of each power of t equal to 0 gives the 
equations 


cos(s — 1) + scodg + Cobo = 0, 
ci(s + Is + (Cs + Iciao + scoa1) + (cibo + cob1) = 0, 
co(s + 2)(s + 1) + (8 + 2)c2G9 + +++) + (Cobo +--+) =0, 


Cn(s +n)(s +n —1) + ((S +n)cnao + +++) + (Cnbo +--+) = 0. 


Since co is by assumption nonzero, we can divide the first equation by it, and we 
obtain 


S(s — 1) +ags + bo = 0, 


which is the indicial equation for t? y’ + t P(t)y’ + Q(t)y = 0. This determines 
the exponent s. Then co is arbitrary, and all subsequent c,’s can be found 
recursively, provided the coefficient of c, in the (n + 1)*' equation above is 
never 0 forn > 1,i.e., provided 


(s +n)(s+n+1)+(s +n)ap +bo #0 forn > 1. 


In other words, we can solve recursively for all c, in terms of co provided s + n 
does not satisfy the indicial equation for any n > 1. We summarize as follows. 


Proposition 4.15. If P(t) and Q(t) are given by convergent power series 
for |t] < R, then the following can be said about formal series solutions of 
t?y" +tP(t)y’ + O(t)y = 0 of the type t° (co + cyt + cot? +--+) with co #0: 

(a) If the indicial equation has distinct roots not differing by an integer, then 
there are formal solutions of the type x*(co + cit + Cot? +s ) for each 
root s of the indicial equation. 

(b) If the indicial equation has roots r; < rz with r2 — r; equal to an inte- 
ger, then there is a 1-parameter family of formal solutions of the type 
t'2(co + cyt + cot? +--+) with co #0. If ry < rp in addition, there may 
be formal solutions t”' (co + cit + cot? +--+) with co ¥ 0, as there are 
for an Euler equation. 


Theorem 4.16. If P(t) and Q(t) are given by convergent power series for 
|t] < R, then all formal series solutions of t7y” + tP(t)y’ + O(t)y = 0 of the 
type t*(co + cit + Cot? +--+) with co 4 O converge for 0 <t < R toa function 
that is a solution forO <t < R. 
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PROOF. As in the proof of Theorem 4.14, fix r with 0 <r < R, and choose 
some R; withr < R, < R. Let the series expansions of P(t) and Q(t) be as 
above, so that there is a number C with |a,| < C/R{ and |b,| < C/R{. Choose 
N large enough so that 


Cr/R ( jsjtn+1 Jer S 
1—r/R; \\(s tn)(s +n4+ 1) +ao(s +12) + bo|/) ~ 


forn > N. Then choose M such that |c,| < M/r” forn < N. We shall prove 
by induction on n that |c,| < M/r" for all n. The base case of the induction 
isn = N, where the inequality holds by definition of M. Suppose it holds for 
1,...,2—1. The formula for c, is 


cn((s tn)(s +n—1)+ao(s +n) 4 bo) 
= —[(stn—l)aycen_1+ +++ +5anco] — [bien-1 + +++ + Oncol. 


Our inductive hypothesis gives 


Icnll(s +n)(s +n—1) + ao(s +n) + bo| 


< CM(|s| +n)(Rp ir") 4. + RZ") 


r r 
=CM(\s|+n+ bron(— eee —) 


Ri R? 
“| (a7) 
< Mr~"|C(\s| tn +1) . 
l—r/R 
Thus 
lenl < mr"! oo ( inausads ) < Mr~", 
1—r/R, \\(s +n)(s +n + 1) +.ao(s +n) + bo| 


the second inequality holding by (), and the induction is complete. 
It follows that }° c,t” converges for |t| <r. Since r can be arbitrarily close 
to R, }\ cyt” converges for |t| < R. This completes the proof. 


EXAMPLE. Bessel’s equation of order p with p > 0. This is the equation 
ty" + ty’ + (t? — p”)y =0. 


It has P(t) = 1 and Q(t) = t? — p*, both with infinite radius of convergence. 
The indicial equation in general is s(s — 1) + aos + bo = 0 and hence is 


s(s —1)+5—p*=0 


in this case. Thus s = +p. Theorem 4.16 shows that there is a solution of the 
form 
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1 
WO = (a5 t cit + cot? 4 -), 


and this is defined to be the Bessel function of order p. The theorem gives 
another solution of the form ¢~? times a power series except possibly when p is 
an integer or a half integer. To determine all these solutions, we substitute the 
series t° )> cyt” and get 


s(s — 1)eo + (8 + Dseit + (s + 2)(8 + Deat? 

+ SCO (s + l)cit (s 2)cot? 

+ cot? + cyt? +s. 

— p’co = pct = p’cot? = pc3t? —:--=0. 


The resulting equations are 


[s(s—l) +s —- P’ Ico =0 from 1°, 
[(s+l)s +(s +1) — p7]ce; =0 from t!, 
[(s +n)(s +n—1)+(s +n) P’|en + Ch-2 = 0 from t” forn > 2. 


The first of these equations repeats the indicial equation, giving s = +p. The 
second says that either c} = 0 or that s + 1 solves the indicial equation. In the 
latter case s = —} and p = s. The third says that [(s + n)* — p7|en = —Cp_2- 
For the case that s = +p, we obtain 

—Cn-2 
(p +n)? — p?’ 
and there is no problem from the denominator. The result is that the Bessel 
function of order p > 0 is given by 


Ch = 


t? t+ 


IO = 5p5(! 22p+2) 1 2-4Qp+2DQp +4) ew), 


oO OO 0 
DO FF DW ©O FF 


FIGURE 4.3. Graph of Bessel function Jo(t). 
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For the case that s = — p, we obtain 


—Cn-2 


a 
Capp) =p? 


and the denominator gives a problem for n = 2p and for no other value of n. If 
p is an integer, the problematic n is even and we must have cy_2 = 0, cnh_a = 0, 

. ,Co = 0. The condition co = 0 is a contradiction, and we conclude that there 
is no solution of the form t~? times a nonzero power series; indeed, Problems 


18- 


19 at the end of the chapter will identify a different kind of solution. If p is 


a half integer but not an integer, then the problematic n is odd, and we are led to 


conclude that 0 = c,_2 = +--+ = c3 = cy, with co and c2, arbitrary. There is no 
contradiction, and we obtain a solution of the form ¢~? times a nonzero power 
series. 


9. Problems 


For the differential equation yy’ = —t: 

(a) Solve the equation. 

(b) Find all points (to, yo) where the the existence theorem and the uniqueness 
theorem of Section 2 do not apply. 

(c) For each point (fo, yo) not in (b), give a solution y(t) with y(to) = yo. 


Prove that the equation y’ = ¢ + y” has a solution satisfying the initial condition 
y(0) = 0 and defined for |t| < 1/2. 


In classical notation, a particular vector field in the plane is given by ./x 4 + 5 a 
Find a parametric realization of an integral curve for this vector field passing 
through (1, 1). 


afr 4 
Evaluate al — (sinst)ds. 
dt Jo s 


Find all solutions on (—oo, +00) to y” — 3y’ + 2y = 4. 
(a) For each of these matrices A, find matrices B and J, with J in Jordan form, 


. 4 0 0 -1 
such that A= BJB7!: a=(j =) a=(0 1 0) 
1 0 0 


(b) For each of the matrices A in (a), find a basis of solutions y(t) to the system 
of differential equations y’ = Ay. 


des 
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The n"-order equation y” +a,_;y"~))+- - -+-agy = O with constant coefficients 
leads to a linear system z’ = Az with 


0 1 00+ 0 0 
0 1 0+ 0 0 
0 1 +0 0 
A= ae 
01 0 
Oo 1 
—ay —a, —a2  An-1 


Prove that det(Al — A) = 4" +a,_1A""!+--+-+-ag by expanding the determinant 

by cofactors. 

(a) Let {f,,} be a uniformly bounded sequence of Riemann integrable functions 
on [0, 1]. Define F,,(t) = i Fn(s) ds. Prove that {F;,} is an equicontinuous 
family of functions on [0, 1]. 

(b) Prove that the set of functions y(t) on [0,1] with y’ + y = f(t) and 
y(O) = y’(0) = 0 is equicontinuous as f varies over the set of continuous 
functions on [0, 1] withO < f(t) < 1 forall r. 

(c) Let u(t) be continuous on [a,b]. Prove that the set of functions y(t) on 
[a, b] with y’ + q(t)y = f(t) and y(0) = y’(O) = 0 is equicontinuous as 
f (t) varies over the set of continuous functions on [0, 1] withO < f(t) < 1 
for all r. 

The differential equation t7y” + (3t — 1)y’ + y = 0 has an irregular singular 

point att = 0. 

(a) Verify that yg (a)t” is a formal power series solution of the equation 
even though the power series has radius of convergence 0. 

(b) Verify that y(t) = t~!e—!/* is a solution for t > 0. 


Problems 10-13 concern harmonic functions in the open unit disk, which were intro- 
duced in Problems 14—15 at the end of Chapter III. The first objective here is to use 
ordinary differential equations and Fourier series to show that all these functions may 
be expressed in a relatively simple form. The second objective is to use convolution, 
as defined in Problem 8 at the end of Chapter III, to relate this formula to the Poisson 
kernel, which was defined in Problems 27-29 at the end of Chapter I. Problems 10-12 
here are an instance of the method of separation of variables, a beginning technique 
with partial differential equations; this topic is developed further in the companion 
volume Advanced Real Analysis. In all problems in this set, let u(x, y) be harmonic 
in the open unit disk. 


10. Write u(x, y) in polar coordinates as u(r cos 6,7 sind) = v(r, 8). Using Fourier 


series, Show for0 <r < 1 andany 6 > 0 that u(r, @) is the sum of an absolutely 
convergent Fourier series °°. cn (r)e!”® with |c,(r)| < M/n? forO <r < 
1 — 6 for some M depending on 6. 
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11. 


12. 


13. 
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Let Rg be the rotation matrix defined in Problem 15 at the end of Chapter II. That 
problem shows that (uo Rg)(x, y) = v(r, 6 + @) is harmonic for each g. Prove 
that a Ly (uo Rg)(x, ye tke dg is harmonic and is given in polar coordinates 
by cx (ryet*?, 

By computing with the Laplacian in polar coordinates and showing that c;(r) 
is bounded as r | 0, prove that c.(r7) = agr|*! for some complex constant 
a;. Conclude that every harmonic function in the open unit disk is of the form 
v(r, 0) = eS cnr '"e'"®, the sum being absolutely convergent for all r with 
O<r<l. 


Deduce from Problem 8 at the end of Chapter III that if u(r, @) is as in the 
previous problem and if0 < R < 1,thenv(r, 0) = x ree FR(M) PrRO—) dp 
for 0 < r < R, where P is the Poisson kernel and fr is the C® function 
fra oR ee, 


Problems 14-17 concern homogeneous linear differential equations. Except for the 
first of the problems, each works with a substitution in a second-order equation that 
simplifies the equation in some way. 


14. 


15. 


16. 


17. 


If a(t) is continuous on an interval and A(t) is an indefinite integral, verify that 
all solutions of the single first-order linear homogeneous equation y’ = a(t)y 
are of the form y(t) = ce4™, 


(a) Suppose that u(t) is a nowhere vanishing solution of 


y” + P(t)y’+ Q(t)y =0 


on an interval, with P and Q assumed continuous. Look for a solution of 
the form u(t)v(t), and derive the necessary and sufficient condition 


v'(t) a cult) Ze J PO, 


(b) For y” — ty’ — y = 0, one solution is e /? 


solution. 


. Find a linearly independent 


Let y” + P(t)y’ + Q(t)y = 0 be given with P, P’, and Q continuous on an 
interval. Write y(t) = u(t)v(f), substitute, regard u(t) as known, and obtain a 
second-order equation for v. Show how to choose u(t) to make the coefficient 
of uv’ be 0, and thus reduce the given equation to an equation v” + R(t)v = 0 
with R continuous. Give a formula for R. 


If L(v) = (pv’)’ — qv + Arv, show that the substitution u = v./r changes 
L(v) = 0 into Lo(u) = 0, where Lo(u) = (p*u’)' — g*u + Au with p* = p/r. 


Problems 18-19 concern finding the form of the second solution to a second-order 
equation with a regular singular point. The first of the two problems amounts to a 
result in complex analysis but requires nothing beyond Chapter I of this book. 


18. 


19. 
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Suppose that baa, Cnx" is a power Series with co = 1. 

(a) Write down recursive formulas for the coefficients d, of a power series 
reo dnx” with do = 1 such that (S79 nx”) (eg dnx") = 1. 

(b) Prove, by induction on n, that if |c,| < Mr” for all n > 0, then |d,| < 
M(M +1)""!r" for alln > 1. 

(c) Prove that if f(0) 4 0 and if f(x) is the sum of a convergent power series 
for |x| < R for some R > 0, then 1/f (x) is the sum of a convergent power 
series for |x| < ¢ for some ¢ > 0. 


Suppose that P(t) and Q(t) are given near tf = 0 by power series with positive 
radii of convergence. Take for granted that if a(t) is given by a power series with 
a positive radius of convergence, then so is e“”). Form the equation 


try” +tP(t)y’ + Oy =9, 


let s; and s2 be the two roots of the indicial equation, and suppose that the 

differential equation has a solution given on some interval (0, ¢) by f(t) = 

By otal wihvg = 0; 

(a) Using Problem 15a, prove that the differential equation has a linearly inde- 
pendent solution given on some interval (0, e’) by 


ioe) 
g(t) =cf(logt +6? So kat" — with ko £0. 
n=0 


(b) Prove that the coefficient c in g(t) is £ 0 if s} = 52. 

(c) For Bessel’s equation t7y" + ty’ + (t? — p*)y = 0 with p > 0 an integer 
and with s; = p and sy = —p, show that the coefficient c in g(t) is 4 0. 
Thus there is a solution of the form J, (t) log t + t~? (power series) on some 


interval (0, e’). 


Problems 20-25 prove the Cauchy—Peano Existence Theorem, that a local solution 
in Theorem 4.1 to y’ = F(t, y) and y(t) = yo exists if F is continuous even 
if F does not satisfy a Lipschitz condition. The idea is to construct a sequence 
of polygonal approximations to solutions, check that they form an equicontinuous 
family, apply Ascoli’s Theorem (Theorem 2.56) to extract a uniformly convergent 
subsequence, and then see that the limit of the subsequence is a solution. A member 
of the sequence of polygonal approximations depends on a number € > 0. With 
notation as in the statement of Theorem 4.1, the construction for [fo, fo + a’] is as 
follows: Choose the 6 of uniform continuity for F and € on the set R. Fix a partition 
to<t) <-++ <t, =to +a’ of [fo, to + a’) with max; {t, — th_1} < min(6,6/M). 
Define y(t), as a function of €, for %_1 <t < t, inductively on k by y(t) = yo and 


y(t) = ye) + F(te-1, y(te-1) (t — te-1). 
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21. 
22. 


23. 


24. 
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Check that the formula for y(t) when t_1 < t < t, remains valid when t = 
t,-1, and conclude that y(t) is continuous. Then prove by induction on k that 
ly(t) — y(to)| < M(t — to) < b for th_1) <t < tg, and deduce that (t, y(¢)) is in 
R’ for to < t < to +a’. 

Prove that | y(t) — y(t’)| < M|t — t’| if t and ?’ are both in [fo, fo +a’). 

The function y’(t) is defined on [fp, t9 + a’] except at the points of the partition 
and is given by y'(t) = F(ty_1, y(te_1)) if %_1 < t < ty. Prove that y(t) = 
yo + i y'(s)ds for to < t < to +a’ and that |y’(s) — F(s, y(s))| < « if 
pea1 <= S <= ky. 


Writing y(t) = yo+ fy LF (s, y(s)) +Ly'(s) — F(s, y(s))]] ds and using the result 
of the previous problem, prove for all t in [to, tg + a’] that 


Iy@ — (v0 + ff FG, ys) ds)| < ea’, 


Let €,, be a monotone decreasing sequence with limit 0, and let y, (t) be a function 
for t in [to, t9 + a’] constructed as above for the number ¢«,. Deduce from 
Problem 21 that {y,(t)} is uniformly bounded and uniformly equicontinuous for 
tin [fo, 9 +a’. 

Apply Ascoli’s Theorem to {y,}, and let y(t) be the uniform limit of a uniformly 
convergent subsequence of {y,}. Prove that y(t) is continuous, and use Prob- 
lem 23 to prove that y(t) = yo + i. F(s, y(s))ds. What modifications are 
needed to the argument to handle [fo — a’, to]? 


CHAPTER V 


Lebesgue Measure and Abstract Measure Theory 


Abstract. This chapter develops the basic theory of measure and integration, including Lebesgue 
measure and Lebesgue integration for the line. 

Section 1 introduces measures, including 1-dimensional Lebesgue measure as the primary ex- 
ample, and develops simple properties of them. Sections 2-4 introduce measurable functions and 
the Lebesgue integral and go on to establish some easy properties of integration and the fundamental 
theorems about how Lebesgue integration behaves under limit operations. 

Sections 5—6 concern the Extension Theorem announced in Section | and used as the final step in 
the construction of Lebesgue measure. The theorem allows o-finite measures to be extended from 
algebras of sets to o-algebras. The theorem is proved in Section 5, and the completion of a measure 
space is defined in Section 6 and related to the proof of the Extension Theorem. 

Section 7 treats Fubini’s Theorem, which allows interchange of order of integration under rather 
general circumstances. This is a deep result. As part of the proof, product measure is constructed and 
important measurability conditions are established. This section mentions that Fubini’s Theorem will 
be applicable to higher-dimensional Lebesgue measure, but the details are deferred to Chapter VI. 

Section 8 extends Lebesgue integration to complex-valued functions and to functions with values 
in finite-dimensional vector spaces. 

Section 9 gives a careful definition of the spaces L!, L?, and L© for any measure space, 
introduces the notion of a normed linear space, and verifies that these three spaces are examples. 
The main theorem of the section about L!, L*, and L® is the completeness of these three spaces as 
metric spaces. In addition, the section proves a version of Alaoglu’s Theorem concerning weak-star 
convergence. 


1. Measures and Examples 


In the theory of the Riemann integral, as discussed in Chapter I for R! and in 
Chapter III for R”, we saw that Riemann integration is a powerful tool when 
applied to continuous functions. Riemann integration makes sense also when 
applied to certain kinds of discontinuous functions, but then the theory has some 
weaknesses. 

Without any change in the definitions, one of these is that the theory applies 
only to bounded functions. Thus we can compute iF xPdx = [x?t!/(p + D]} = 
(p +1)! for p > 0, but only the right side makes sense for —1 < p < 0. More 
seriously we made calculations with trigonometric series in Section 1.10 and found 
that 3 log (;=45) ye ne and 5 (a =O) sine for0 <6 <2nz. 


2—2cos0 n=l on 
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When we tried to explain these similar-looking identities with Fourier series, we 
were able to handle the second one because 4 (a — @) is a bounded function, but 
we were not able to handle the first one because 5 log (533) is unbounded. 

Other weaknesses appeared in Chapters I-IV at certain times: when we always 
had to arrange for the set of integration to be bounded, when we had no clue which 
sequences {cy} of Fourier coefficients occurred in the beautiful formula given by 
Parseval’s Theorem, when Fubini’s Theorem turned out to be awkward to apply 
to discontinuous functions, and when the change-of-variables formula did not 
immediately yield the desired identities even in simple cases like the change from 
Cartesian coordinates to polar coordinates. 

The Lebesgue integral will solve all these difficulties when formed with respect 
to “Lebesgue measure” in the setting of R”. In addition, the Lebesgue integral 
will be meaningful in other settings. For example, the Lebesgue integral will be 
meaningful on the unit sphere in Euclidean space, while the Riemann integral 
would always require a choice of coordinates. The Lebesgue integral will be 
meaningful also in other situations where we can take advantage of some action 
by a group (such as a rotation group) that is difficult to handle when the setting has 
to be Euclidean. And the Lebesgue integral will enable us to provide a rigorous 
foundation for the theory of probability. 

There are five ingredients in Lebesgue integration, and these will be introduced 
in Sections 1-3 of this chapter: 


(i) anunderlying nonempty set, suchas R! in the case of Lebesgue integration 
on the line, 
(ii) a distinguished class of subsets, called the “measurable sets,” which will 
form a “o-ring” or a “o-algebra,” 
(111) a measure, which attaches a member of [0, +00] to each measurable set 
and which will be “length” in the case of Lebesgue measure on the line, 
(iv) the “measurable functions,” those functions with values in R (or some 
more general space) that we try to integrate, 
(v) the integral of a measurable function over a measurable set. 


Let us write X for the underlying nonempty set. The important thing about 
whatever sets are measurable will be that certain simple set-theoretic operations 
lead from measurable sets to measurable sets. The two main definitions are those 
of an “algebra” of sets and a “o-algebra,” but we shall refer also to the notions of 
a “ring” of sets and a “o-ring” in order to simplify certain technical problems in 
constructing measures. An algebra of sets A is a set of subsets of X containing 
@ and X and closed under the operation of forming the union E U F of two 
sets and under taking the complement E° of a set. An algebra is necessarily 
closed under intersection E M F and difference E — F = EN F*. Another 
operation under which A is closed is symmetric difference, which is defined by 
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EAF =(E — F)UC(F — E); we shall make extensive use of this operation! in 
Section 6 of this chapter. 

In practice, despite the effort often needed to define an interesting measure on 
the sets in an algebra, the closure properties” of the algebra are insufficient to deal 
with questions about limits. For this reason one defines a o-algebra of subsets of 
X tobe an algebra that is closed under countable unions (and hence also countable 
intersections). Typically a general foundational theorem (Theorem 5.5 below) is 
used to extend the constructed would-be measure from an algebra to a o-algebra. 

A ring 7 of subsets of X is a set of subsets closed under finite unions and under 
difference. Then 7 is closed also under the operations of finite intersections, 
difference, and symmetric difference? A o-ring of subsets of X is a ring of 
subsets that is closed under countable unions. 


EXAMPLES. 
(1) A = {@, X}. This is a o-algebra. 
(2) All subsets of X. This is a o-algebra. 


(3) All finite subsets of X. This is a ring. If the complements of such sets are 
included, the result is an algebra. 


(4) All finite and countably infinite subsets of X. This is a o-ring. If the 
complements of such sets are included, the result is a o-algebra. 


(5) All elementary sets of R. These are all finite disjoint unions of bounded 
intervals in R with or without endpoints. This collection is a ring. To see the 
closure properties, we first verify that any finite union of bounded intervals is a 


finite disjoint union; in fact, if 7;,..., 7, are bounded intervals such that none 
contains any of the others, then [;, — pee I 1s an interval, and these intervals 


are disjoint as k varies; also these intervals have the same union as /),..., Jy. 
Now let E = U; 7; and F = Uj; Jj be given. Since J; M J; is an interval, 
the identity EQ F = U;; Ui 9 Jj) shows that E 2 F is a finite union of 
intervals. Since each J; — J; is an interval or the union of two intervals, the 
identity E — F = U;{), Ui — Jj) then shows that E — F is a finite union of 
intervals. 


(6) If C is an arbitrary class of subsets of X, then there is a unique smallest 
algebra A of subsets of X containing C. Similar statements apply to o-algebras, 


'For some properties of symmetric difference, see Problem 1 at the end of the chapter. 

?An algebra of sets really is an algebra in the sense of the discussion of algebras with the 
Stone—Weierstrass Theorem (Theorem 2.58). The scalars replacing R or C are the members of the 
two-element field {0, 1}, addition is given by symmetric difference, and multiplication is given by 
intersection. The additive identity is @, the multiplicative identity is X , and every element is its own 
negative. Multiplication is commutative. 

3A ring of sets really is a ring in the sense of modern algebra; addition is given by symmetric 
difference, and multiplication is given by intersection. 
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rings, and o-rings. In fact, consider all algebras of subsets of X containing C. 
Example 2 shows that there is one. Let A be the intersection of all these algebras, 
i.e., the set of all subsets that occur in each of these algebras. If two sets occur 
in A, they occur in each such algebra, and their intersection is in each algebra. 
Hence their intersection is in A. Similarly A is closed under differences. 


If 2 is a ring of subsets of X, a set function is a function p : R — R*, where 
IR* denotes the extended real-number system as in Section I.1. The set function 
is nonnegative if its values are all in [0, -+-oo], it is additive if o(@) = 0 and if 
P(E UF) = p(E) + p(F) whenever EF and F are disjoint sets in R, and it is 
completely additive or countably additive if p(@) = 0 and if p(Us, En) = 
>. P(En) whenever the sets E, are pairwise disjoint members of R with 
Ure, E, in R. In the definitions of “additive” and “completely additive,” it is 
taken as part of the definition that the sums in question are to be well defined in 
IR*. Observe that completely additive implies additive, since p(@) = 0. 


Proposition 5.1. An additive set function p on aring F of sets has the following 
properties: 


(a) p( (ess E,) — Seer p(E,,) if the sets E,, are pairwise disjoint and are 
inR. 

(b) p(EUF)+ p(ENF) = p(E)+ o(F) if E and F are in R. 

(c) If E and F are in R and |p(E)| < +00, then |o(EM F)| < +o0. 

(d) If EF and F are in R and if |p(E NM F)| < +o, then p(F — F) = 
P(E) — p(EN F). 


(e) If p is nonnegative and if E and F are in R with E C F, then p(E) < 
P(F). 
(f) If is nonnegative and if FE, F,..., Ey are sets in R such that E C 


Unei En, then p(E) < S77, p(En). 
(g) If o is nonnegative and completely additive and if F, E;, E2,... are sets 
in R such that E C J, En, then p(E) < °°, p(En). 

PROOF. Part (a) follows by induction from the definition. In (b), we have 
EUF =(E-F)U(ENF)U(F — E) disjointly. Application of (a) gives 
P(EUF)= p(E-—F)+p(ENF)+ p(F — E), with +co and —oo not both 
occurring. Adding p(E M F) to both sides, regrouping terms, and taking into 
account that o0(F) = p(E — F)+ p(EN F) and p(F) = p(F —E)+p(ENF), 
we obtain (b). The right side of the identity p(E) = p(EN F)+p(E — F) cannot 
be well defined if o(£) is finite and p(E NM F) is infinite, and thus (c) follows. In 
the identity 0(E) = p(EN F)+ p(E — F), we can subtract p(E MN F) from both 
sides and obtain (d) if p(Z M F) is finite. For (e), the inclusion E C F forces 
F = (F — £)UE disjointly; then p(F) = p(F — E) + p(£), and (e) follows. 
In (f), put F, = En — ies, E,. Then E = ee (EO F,,) disjointly, and (a) and 
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(e) give p(E) = -™_, p(EN Fy) < ™, p(Fr) < o™, p(En). Conclusion 
(g) is proved in the same way as (f). 


Proposition 5.2. Let o be an additive set function on a ring R of sets. If p 
is completely additive, then o(£) = lim p(E,,) whenever {£,,} is an increasing 
sequence of members of R with union F in R. Conversely if o(£) = lim p(E,) 
for all such sequences, then p is completely additive. 


PROOF. First we prove the direct part of the proposition. For E and E,, as in 
the statement, let F,| = E, and F, = E, — E,_; forn > 2. Then E, = Sar F; 
disjointly, and p(En) = >-7_, e(Fx) by additivity. Also, E = Uz, Fe, and 
complete additivity gives p(E) = )°7°, p( Fy) = lim 77_, p( Fx) = lim p(E,). 
The direct part of the proposition follows. 

For the converse let {F;,} be a disjoint sequence in R with union F in R. Put 
| ee eae F,,. Then E,, is an increasing sequence of sets in R with union F 
in R. We are given that p(F) = lim p(E,), and we have p(En) = >-y_, p(Fr) 
by additivity and Proposition 5.la. Therefore p(F) = )°7°, p(Fy), and we 
conclude that o is completely additive. 


Corollary 5.3. Let p be an additive set function on an algebra A of subsets of 
X such that |o(X)| < +00. If p is completely additive, then p(E) = lim p(E,,) 
whenever {E,,} is a decreasing sequence of members of A with intersection E 
in A. Conversely if lim o(E,) = 0 whenever {E,,} is a decreasing sequence of 
members of A with intersection empty, then p is completely additive. 


PROOF. This follows from Proposition 5.2 by taking complements. 


A measure is a nonnegative completely additive set function on a o-ring of 
subsets of X. If no ambiguity is possible about the o-ring, we may refer to a 
“measure on X.” When we use measures to work with integrals, the o-ring will 
be taken to be a o-algebra; if integration were to be defined relative to a o-ring 
that is not a o-algebra, then nonzero constant functions would not be measurable. 

The assumption that our o-ring is a o-algebra for doing integration is no loss 
of generality. Even when the o-ring is not a o-algebra, there is a canonical way 
of extending a measure from a o-ring to the smallest o-algebra containing the 
o-ring. Proposition 5.37 at the end of Section 5 gives the details. 


EXAMPLES. 
(1) For {@, X}, define u(X) =a > 0. This is a measure. 


(2) For X equal to a countable set and with all subsets in the o-algebra, attach 
a weight > 0 to each member of X. Define j(£) to be the sum of the weights 
for the members of EF’. This is a measure. 
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(3) For X arbitrary but nonempty, let ~(£) be the number of points in E, a 
nonnegative integer or +00. We refer to 4. as counting measure. 


(4) Lebesgue measure m on the ring R of elementary sets of R. If E isa 
finite disjoint union of bounded intervals, we let m(E) be the sum of the lengths 
of the intervals. We need to see that this definition is unambiguous. Consider 
the special case that J = J; U--- UJ, disjointly with J, extending from a, 
to b,, with or without endpoints. Then we can arrange the intervals in order 
so that by = axy, fork = 1,...,7 — 1. In this case, m(J) = b, — a, and 
Pe MU) = Op) (Oe — ax) = by — ay. Thus the definition is unambiguous in 
this special case. If FE = 1; U---UT, = J) U---U Js, then the special case gives 
m(Jg) = D1 mj Jg) and hence )%_, m(Jk) = D0; , MU; Jz). Reversing 
the roles of the /;’s and the J;’s, we obtain )°_, mj) = D0, ,. mj O Jx). Thus 
Site) = am m(/;), and the definition of m on R is unambiguous. It is 
evident that m is nonnegative and additive. We shall prove that m is completely 
additive on #2. Even so, m will not yet be a measure, since FR is not a o-ring. 
That step will have to be carried out separately. Proving that m is completely 
additive on the ring 7 uses the fact that m is regular on 7 in the sense that 
if E is in R and if € > O is given, then there exist a compact set K in R 
and an open set U in R such that K C E C U, m(K) > m(E) — «€, and 
m(U) < m(E) + €: In the special case that E is a single bounded interval with 
endpoints a and b, we can prove regularity by taking U = (a — €/2,b + €/2) 
and by letting K = @ifb-—a<eorK =[a+€/2,b—€/2)ifb—a>e.In 
the general case that EF is the union of n bounded intervals J;, choose K; and U; 
for J; and for the number €/n, and put K = j_, Kj and U = Uj_, Uj. Then 
m(K) = S\j_,m(Kj) > S3j_, (mUj) — €/n) = m(E) — €, and Proposition 
5.1f gives m(U) < Oi_m(Uj) < Diy (m(Uj) + €/n) = m(E) +e. 


Proposition 5.4. Lebesgue measure m is completely additive on the ring R 
of elementary sets in R!. 


PROOF. Let {F,} be a disjoint sequence in R with union E in R. Since 
m is nonnegative and additive, Proposition 5.1 gives m(E) > m( 4 Ex) = 
>-1 m(E,) for every n. Passing to the limit, we obtain m(E) > )°7°., m(Ex). 
For the reverse inequality, let € > 0 be given. Choose by regularity a compact 
member K of R and open members U,, of R such that K C E,U, D E, for all 
n,m(K) > m(E) —€,and m(U,) < m(E,) + €/2”. Then K C J, U,, and 
the compactness implies that K C LES U,, for some N. Hence m(E) —€ < 
m(K) < bea m(U,) < ae (m(En) +€/2") < ¥°°°, m(E,) +€. Since € is 
arbitrary, m(E) < paar m(E,,), and the proposition follows. 
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The smallest o-ring containing the ring R of elementary sets in R! is in 
fact a o-algebra, since R! is the countable union of bounded intervals. For 
Lebesgue measure to be truly useful, it must be extended from R to this o-algebra, 
whose members are called the Borel sets of R'. Borel sets of R! can be fairly 
complicated. Each open set is a Borel set because it is the countable union of 
bounded open intervals. Each closed set is a Borel set, being the complement 
of an open set, and each compact set is a Borel set because compact subsets of 
R! are closed. In addition, any countable set, such as the set Q of rationals, is a 
Borel set as the countable union of one-point sets. 


The extension is carried out by the general Extension Theorem that will be 
stated now and will be proved in Section 5. The theorem gives both existence 
and uniqueness for an extension, but not without an additional hypothesis. The 
need for an additional hypothesis to ensure uniqueness is closely related to the 
need to assume some finiteness condition on p in Corollary 5.3: even though each 
member of a decreasing sequence of sets has infinite measure, the intersection 
of the sets need not have infinite measure. To see what can go wrong for the 
Extension Theorem, consider the ring R’ of subsets of R! consisting of all finite 
unions of bounded intervals with rational endpoints; the individual intervals may 
or may not contain their endpoints. If a set function y is defined on this ring 
by assigning to each set the number of elements in the set, then jz is completely 
additive. Each interval in R! can be obtained as the union of two sets—a countable 
union of intervals with rational endpoints and a countable intersection of intervals 
with rational endpoints. It follows that the smallest o-ring containing R’ is the 
o-algebra of all Borel sets. The set function jz can be extended to the Borel sets 
in more than way. In fact, each one-point set consisting of a rational must get 
measure 1, but a one-point set consisting of an irrational can be assigned any 
measure. 


The additional hypothesis for the Extension Theorem is that the given nonneg- 
ative completely additive set function v on a ring of sets R be o-finite, i.e., that 
any member of 7? be contained in the countable union of members of R on which 
v is finite. An obvious sufficient condition for o-finiteness is that v(E) be finite 
for every set in R. This sufficient condition is satisfied by Lebesgue measure on 
the elementary sets, and thus the theorem proves that Lebesgue measure extends 
in a unique fashion to be a measure on the Borel sets. 


The condition of o-finiteness is less restrictive than a requirement that X be the 
countable union of sets in 7 of finite measure, another condition that is satisfied 
in the case of Lebesgue measure. The condition of o-finiteness on a ring allows 
for some very large measures when all the sets are in a sense generated by the sets 
of finite measure. For example, if F is the ring of finite subsets of an uncountable 
set and v is the counting measure, the o-finiteness condition is satisfied. In most 
areas of mathematics, these very large measures rarely arise. 
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Theorem 5.5 (Extension Theorem). Let 7? be a ring of subsets of a nonempty 
set X, and let v be a nonnegative completely additive set function on FR that is 
o-finite on R. Then v extends uniquely to a measure jz on the smallest o-ring 
containing ?. 


A measure space is defined to be a triple (X, A, 4), where X is a nonempty 
set, A is a o-algebra of subsets of X, and wz is a measure on X. The measure 
space is finite if u(X) < +00; it is o-finite if X is the countable union of sets 
on which yp is finite. The real line, together with the o-algebra of Borel sets and 
Lebesgue measure, is a o-finite measure space. 


2. Measurable Functions 


In this section, X denotes a nonempty set, and A is a o-algebra of subsets of X. 
The measurable sets are the members of A. 
We say that a function f : X — R* is measurable if 


(i) f—!({—00, c)) is a measurable set for every real number c. 


Equivalently the measurability of f may be defined by any of the following 
conditions: 


(ii) f~!({—o0, c]) is a measurable set for every real number c, 
(iii) f—'((c, +00]) is a measurable set for every real number c, 
(iv) f~'({c, +00]) is a measurable set for every real number c. 


In fact, the implications (i) implies (ii), (ii) implies (iii), (ii) implies (iv), and (iv) 
implies (i) follow from the identities* 


f7'([-00, cl) = {| f7'(-00, + 4)), 


n=1 


f '(c, too) = (f 7-00, -e)))*, 


f7\(le, t00]) = ( } fe — 4, +00)), 
n=1 


foo, c)) = (f “"(-e, +00)))*. 


EXAMPLES. 

(1) If A = {@, X}, then only the constant functions are measurable. 

(2) If A consists of all subsets of X, then every function from X to R* is 
measurable. 


4Manipulations with inverse images of sets are discussed in Section A1 of the appendix. 
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(3) If X = R! and A consists of the Borel sets of R!, the measurable functions 
are often called Borel measurable. Every continuous function is Borel measur- 
able by (i) because the inverse image of every open set is open. Any function 
that is 1 on an open or compact set and is 0 off that set is Borel measurable. It is 
shown in Problem 11 at the end of the chapter that not every Riemann integrable 
function (when set equal to 0 off some bounded interval) is Borel measurable. 
However, let us verify that every function that is continuous except at countably 
many points is Borel measurable. In fact, let C be the exceptional countable set. 
The restriction of f to the metric space R—C is continuous, and hence the inverse 
image in R — C of any open set [—oo, c) is open in R — C. Hence the inverse 
image is the countable union of sets (a, b) — C, and these are Borel sets. The full 
inverse image in R of [—oo, c) under f is the union of a countable set and this 
subset of R — C and hence is a Borel set. 


(4) If X = R! and if A consists of the “Lebesgue measurable sets” in a sense 
to be defined in Section 5, the measurable functions are often called Lebesgue 
measurable. Every Borel measurable function is Lebesgue measurable, and so 
is every Riemann integrable function (when set equal to 0 off some bounded 
interval). 


The next proposition discusses, among other things, functions f*, f~, and 
| f| defined by fT (x) = max{f (x), 0}, f~ (x) = — min{ f(x), 0}, and | f|(x) = 
|f(@)|. Then f = ft — f~ and |f| = f* + fo. 


Proposition 5.6. 


(a) Constant functions are always measurable. 

(b) If f is measurable, then the inverse image of any interval is measurable. 

(c) If f is measurable, then the inverse image of any open set in R* is measur- 
able. 

(d) If f is measurable, then the functions f+, f~,and | f| are measurable. 


PROOF. In (a), the inverse image of a set under a constant function is either @ 
or X and in either case is measurable. In (b), the inverse image of an interval is the 
intersection of two sets of the kind described in (i) through (iv) above and hence 
is measurable. In (c), any open set in R* is the countable union of open intervals, 
and the measurability of the inverse image follows from (b) and the closure of A 
under countable unions. In (d), (f*)7!((c, +00)) equals f'((e, to00)) ife > 0 
and equals X if c < 0. The measurability of f~ and | f| are handled similarly. 


Next we deal with measurability of sums and products, allowing for values 
+oo and —oo. Recall from Section I.1 that multiplication is everywhere defined 
in R* and that the product in R* of 0 with anything is 0. 
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Proposition 5.7. Let f and g be measurable functions, and let a be in R. Then 
af and fg are measurable, and f + g is measurable provided the sum f (x)+ g(x) 
is everywhere defined. 


ProoF. For f + g, with Q denoting the rationals, 
(f +2)‘, +001 = (J fe +r, +00] 1g", +00]. 
reQ 
If a = 0, then af = 0, and 0 is measurable. If a 4 0, then 
7 | ie ar +00 ifa > 0, 
(af) '(c, +00] = | m 
f [ — 00, £) ifa <0. 


If f and g are measurable and are > 0, then 


U-eg,rs0f '(£, +00] Ng", +00] ifc>0, 


| ! 
(f8) (e.+00l= | iP. 


Hence fg is measurable in this special case. In the general case the formula 
fe=frett+fe fre f gt exhibits fg as the everywhere-defined 
sum of measurable functions. 


Proposition 5.8. If {f,} is a sequence of measurable functions, then the 
functions 


(a) sup, fn. 

(b) inf, fr, 

(c) limsup, fn, 

(d) liminf f,, 
are all measurable. 

PROOF. For (a) and (b), we have (sup fn) (ec, +00] = al aa Ramee +o] 
and (inf f,)~'([—o0,c) = hee fr'[-00, c). For (c) and (d), we have 
lim sup, fn = inf, sup,,,, fx and lim inf, f, = sup, infkon fr. 


Corollary 5.9. The pointwise maximum and the pointwise minimum of a 
finite set of measurable functions are both measurable. 

PROOF. These are special cases of (a) and (b) in the proposition. 

Corollary 5.10. If {f,} is a sequence of measurable functions and if f(x) = 
lim f,(x) exists in R* at every x, then f is measurable. 


PROOF. This is the special case of (c) and (d) in the proposition in which 
lim sup, fn = liminf, fh. 
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The above results show that the set of measurable functions is closed under 
pointwise limits, as well as the arithmetic operations and max and min. Since 
the measurable functions will be the ones we attempt to integrate, we can hope 
for good limit theorems from Lebesgue integration, as well as the familiar results 
about arithmetic operations and ordering properties. 

If E is a subset of X, the indicator function? /; of E is the function that is 1 
on E and is 0 elsewhere. The set (I¢)~!(c, +00] is @ or E or X, depending 
on the value of c. Therefore J is a measurable function if and only if F is a 
measurable set. 

A simple function s : X — R*%* is a function s with finite image contained 
in R. Every simple function s has a unique representation as 5 = Seal 
where the c,, are distinct real numbers and the E,, are disjoint nonempty sets with 
union X. In fact, the set of numbers c, equals the image of s, and E,, is the 
set where s takes the value c,. This expansion of s will be called the canonical 
expansion of s. The set so! (c, +00] is the union of the sets E,, such that c < cy, 
and it follows that s is a measurable function if and only if all of the sets E,, in 
the canonical expansion are measurable sets. 


Proposition 5.11. For any function f : X — [0, +00], there exists a sequence 
of simple functions s, > 0 with the property that for each x in X, {s,(x)} isa 
monotone increasing sequence in R with limit f(x) in R*. If f is measurable, 
then the simple functions s may be taken to be measurable. 


PROOF. For 1 <n < coand 1 < j <n2", let 


‘, n2" . 
= j-1 
x), FF, =f ‘In, +00), in =) ley +: 


j=l 
Then {s,} has the required properties. 


By convention from now on, simple functions will always be understood to be 
measurable. 


3. Lebesgue Integral 


Throughout this section, (X, A, j4) denotes a measure space. The measurable sets 
continue to be those in A. Our objective in this section is to define the Lebesgue 


5As noted in Chapter III, indicator functions are called “characteristic functions” by many authors, 
but the term “characteristic function” has another meaning in probability theory and is best avoided 
as a substitute for “indicator function” in any context where probability might play a role. 
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integral. We defer any systematic discussion of properties of the integral to 
Section 4. 

Just as with the Riemann integral, the Lebesgue integral is defined by means 
of an approximation process. In the case of the Riemann integral, the process 
is to use upper sums and lower sums, which capture an approximate value of an 
integral by adding contributions influenced by proximity in the domain of the 
integrand. The process is qualitatively different for the Lebesgue integral, which 
captures an approximate value of an integral by adding contributions based on 
what happens in the image of the integrand. 

Let s be a simple function > 0. By our convention at the end of the previous 
section, we have incorporated measurability into the definition of simple function. 
Let E be a measurable set, and let s = eae, Cyl, be the canonical expansion of 
s. We define Zz (s) = sean Cnit(An OE). This kind of object will be what we 
use as an approximation in the definition of the Lebesgue integral; the formula 
shows the sense in which Zz(s) is built from the image of the integrand. 

If f > 0 is a measurable function and EF is a measurable set, we define the 
Lebesgue integral of f on the set E with respect to the measure jz to be 


[tan= f(x)du(x) = sup Tz(s). 
E E O<s<f, 
s simple 


This is well-defined as a member of R* without restriction as long as E is a 
measurable set and the measurable function f is > 0 everywhere on X. It is 
evident in this case that f,, f du > 0 and that f,,0du = 0. 

For a general measurable function f, not necessarily > 0, the integral may or 
may not be defined. We write f = ft — f~. The functions f* and f~ are 
> 0 and are measurable by Proposition 5.6d, and consequently /, f* du and 
J, f du are well-defined members of R*. If f, ft du and f,, f~ du are not 
both infinite, then we define 


[tau= [ toraney= fo rtan— frau. 


This definition is consistent with the definition in the special case f > 0, since 
such an f has f~ = 0 and therefore /, pf du =O. We say that f is integrable 
if [, f+ du and f,, f~ dy are both finite. In this case the subsets of E where 
f is +c00 and where f is —oo have measure 0. In fact, if S is the subset of E 
where f* is +00, then the inequality [,, fT du > Zg(CIs) = Cu(S) for every 
C > 0 shows that (S) < C7! fe f* du for every C; hence w(S) = 0. A 
similar argument applies to the set where f~ is +oo. 

We shall give some examples of integration after showing that the definition 
of [ rp J du reduces to Zz(f) if f is nonnegative and simple. The first lemma 
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below will make use of the additivity of 1, and the second lemma will make use 
of the fact that jz is nonnegative. 


Lemma 5.12. Let s = ... 


n_1 Cnt, be the canonical expansion of a simple 


function > 0, and let s = art d,Ip,, be another expansion in which the d,, are 
> Oand the B,, are disjoint and measurable. Then Zz (s) = aes Ain (Bm OE). 


PROOF. Adjoin the term 0 - /(),, g,,)« to the second expansion, if necessary, to 
make (4 By», = X. Without loss of generality, we may assume that no B,, 
is empty. Then the fact that the sets B,, are disjoint and nonempty with union 
X implies that the image of s is {d,,...,dy}. Thus we can write dy = Crim) 
for each m. Since A, = s~!({en}), we see that By C Anim). Since the B,, are 
disjoint with union X , we obtain 


Ape il Be 
{m | n(m)=k} 
disjointly. The additivity of w gives u(Ag ON E) = Yoon jnonyaey (Bm O E), and 
thus c, (Ap N E) = eee din L(Bm O E). Summing on k, we obtain the 
conclusion of the lemma. 


Lemma 5.13. If s and ¢ are nonnegative simple functions and if t < s on E, 
then Zz (t) < Ze (s). 


PROOF. If s = ae cjla, and t = ye L, dk Ie, are the canonical expansions 
of s and t, then Vix (Aj N By) = X disjointly. Hence we can write 


s= ) cil ajnB and t= ) AT AajnBy- 
Jk jk 


Lemma 5.12 shows that 


Te(s)= > cjm(AjOBeNE) and Tg(t) =) dew(AjN Be ME). 
ik ik 
We now have term-by-term inequality: either (A; By E) = 0 for a term, or 
Aj;NB,NE # © and any x in Aj BY NE has t(x) < s(x) and exhibits dy < c;. 


Proposition 5.14. If s > 0 is a simple function, then f p Sau = Tz (s) for 
every measurable set E. 


PROOF. If ¢ is a simple function with O < t < s everywhere, then Lemma 
5.13 gives Zg(t) < Te(s). Hence f,sdu = supye,<,Ze(t) < Te(s). On 
the other hand, we certainly have Zz (s) < supp<,<,Ze(t) = ifs sd, and thus 
J,sdu =Tg(s). 
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EXAMPLES. 


(1) Let A = {@, X} and w(X) = 1. Only the constant functions are measur- 
able, and f,cdu =Oand fy cdu =c. 

(2) Let X be a nonempty countable set, let A consist of all subsets of X, and 
let yz be defined by nonnegative finite weights w; attached to each point i in X. 
If f = {fi} is a real-valued function, then the integral of f over X is )> fiw; 
provided the integrals of f+ and f~ are not both infinite, i.e., provided every 
rearrangement of the series )° f;w; converges in R* to the same sum. By contrast, 
f is integrable if and only if the series 5° f;w; is absolutely convergent. In the 
special case that all the weights w, are 1, the theory of the Lebesgue integral over 
X reduces to the theory of infinite series for which every rearrangement of the 
series converges in R* to the same sum. This is a very important special case for 
testing the validity of general assertions about Lebesgue integration. 

(3) Let (X, A, 1) be the real line R! with A consisting of the Borel sets and with 
jt equal to Lebesgue measure m. Recall that real-valued continuous functions on 
R! are measurable. For such a function f, the assertion is that 


[sam = f° pear, 
[a,x) a 


the left side being a Lebesgue integral and the right side being a Riemann integral. 
Proving this assertion involves using some properties of the Lebesgue integral 
that will be proved in the next section. We give the argument now before these 
properties have been established, in order to emphasize the importance of each 
of these properties: If h > 0, then 


1 1 
“| i ne [. fam|— fe) => i fdm — f(x) 


1 
= al Lf — f(x) dm. 
[x,x+h) 


The absolute value of the left side is then 


1 1 
<7 | lf —f(x)|dm<— sup |f(t)— f()|m([x,x +h)) 
hh Jix,x+h) N tetx,x-+h) 


= sup If —f@)l, 
h) 


te[x,x+ 
and the right side tends to 0 as h decreases to 0, by continuity of f atx. Ifh <0, 
then the argument corresponding to the first display is 
1 


1 
iL fea! ” =, J fe a hems ae 


1 
= — — dm. 
i es f(x)|dm 
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The absolute value of the left side is then < sup;cry—jnj.x) Lf (t) — f()|, and 
this tends to 0 as fh increases to 0, by continuity of f at x. We conclude that 
Sra) f dm is differentiable with derivative f. By the Fundamental Theorem of 
Calculus for the Riemann integral, together with a corollary of the Mean Value 
Theorem, Jace fdm= fe f(t) dt + c for all x and some constant c. Putting 
x = a,we see thatc = 0. Therefore the Riemann and Lebesgue integrals coincide 
for continuous functions on bounded intervals [a, b). 


4. Properties of the Integral 


In this section, (X,.A, 2) continues to denote a measure space. Our objective 
is to establish basic properties of the Lebesgue integral, including properties 
that indicate how Lebesgue integration interacts with passages to the limit. The 
properties that we establish will include all remaining properties needed to justify 
the argument in Example 3 at the end of the previous section. 


Proposition 5.15. The Lebesgue integral has these four properties: 

(a) If f is a measurable function and w(E) = 0, then ts fdu =o. 

(b) If E and F are measurable sets with F C E and if f is a measurable func- 
tion, then f, ft du < f, ft duand f,, f-du < J, f~ du. Consequently, if 
J, f du is defined, then so is [;, du. 

(c) If c is a constant function with its value in IR*, then J, pcdu = cue). 

(d) If f,, f du is defined and if c is in R, then f,,cf dy is defined and 
J,cfduw=c J, f du. If f is integrable on E, then so is cf. 


PRooF. In (a), it is enough to deal with f* and f~ separately, and then it is 
enough to handle s > 0 simple. For such an s, Proposition 5.14 says that the 
integral equals Zz (s), and the definition shows that this is 0. In (b), Proposition 
5.14 makes it clear that the inequalities are valid for any simple function > 0, 
and then the general case follows by taking the supremum first forO < s < f* 
and then forO < s < f~. In(c), if 0 < c < +o, then c is simple, and the 
integral equals Ze (c) = cu(E) by Proposition 5.14. If c = +00, then the case 
iL(E) = 0 follows from (a) and the case jz(E) > 0 is handled by the observations 
that ie cdu > Te(n) = npw(E) and that the right side tends to +00 as n tends 
to +00. For c < 0, we have f,,cdu = — f,,(—c) du by definition, and then 
the result follows from the previous cases. In (d), we may assume, without loss 
of generality, that f > 0 andc > 0. Then f,cfdu = SUPo<s<cp LE(S) = 
SUPQ<cr<cf Tr(ct) = C SUPg<r<f Trt) =c hes f dp, and (d) is proved. 


Proposition 5.16. If f and g are measurable functions, if their integrals over 
E are defined, and if f(x) < g(x) on E, then f,, fdu < f, gdu. 
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REMARK. Observe that the inequality f(x) < g(x) is assumed only on E, 
despite the definitions that take into account values of a function everywhere on 
X. This “localization” property of the integral is as one wants it to be. 


PROOF. First suppose that f > 0 and g > 0. Ifs is any simple function with 
0<s < f, define t to equal s on E and to equal 0 off E. Then 0 <t < g,and 
Lemma 5.13 gives Zz(s) = Zg(t) < fi, g du. Hence f, fdu < J, gdu when 
f >Oand g > 0. 

In the general case the inequality f(x) < g(x) on E implies that fT(x) < 
g*(x) on E and f~(x) => g (E) on E. The special case gives f, ft du < 
J,g* duand ff du = fg du. Subtracting these inequalities, we obtain 
the desired result. 


Corollary 5.17. If f and g are measurable functions that are equal on F and 
if [,, f du is defined, then /,, g dy is defined and [,, fdu = J, g du. 


PROOF. Apply Proposition 5.16 to the following inequalities on F, and then 
sort out the results: f* < g*, ff >et,f <g ,andf >g. 


Corollary 5.18. If f is a measurable function, then f is integrable on E if 
either 


(a) there is a function g integrable on E such that | f(x)| < g(x) on E, or 
(b) «(£) is finite and there is a real number c such that | f(x)| < con E. 


PROOF. For (a), apply Proposition 5.16 to the inequalities f* < g and f~ < g 
valid on E.. For (b), use the formula for /;, c dj in Proposition 5.15c and apply (a). 


We turn our attention now to properties that indicate how Lebesgue integration 
interacts with passages to the limit. These make essential use of the complete 
additivity of the measure jz. We shall bring this hypothesis to bear initially through 
the following theorem. 


Theorem 5.19. Let f bea fixed measurable function, and suppose that f,, f du 
is defined. Then the set function p(E) = J, f du is completely additive. 


PROOF. We have o(@) = 0 by Proposition 5.15a, since 4(@) = 0. We shall 
prove that if f > 0, then p is completely additive. The general case follows 
from this by applying the result to f* and f~ separately and by using the fact 
that f, ft du and fy f~ dy are not both infinite. Thus we are to show that if 
E = (J, En disjointly and if f > 0, then p(E) = °°, p(E,). 

For simple s > 0 with canonical expansion s = pede Cnta,, the identity 


Tr(s) = YS Cnt(A, NF) and the complete additivity of j4 show that Zr (s) is 
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a completely additive function of the set F. Thus for s simple withO <s < f, 
we have 


Te(s) = ))Tz,(s) <)> p(En)- 
n=1 n=1 


CO 
Hence p(E)= sup Te(s)< ) (En). 
O<s<f 


n=1 


We now prove the reverse inequality. By Proposition 5.15b, p(E) > p(E,) 
for every n, since f = f+. Hence if p(E,) = +00 for any n, the desired result 
is proved. Thus assume that p(E,,) < +00 for all n. Let € > 0 be given, and 
choose simple functions ¢ and u that are > 0 and are < f and have 


Trt) = f fdu-e and Te,(u) = f fdu—e. 
E, E 


Let s be the pointwise maximum s = max{t, vu}. Then s is simple, and Lemma 
5.13 gives Zz, (s) > Ze, (t) and Zz, (s) > Ze,(u). Consequently 


p(E, U Er) = 7 f dp = True (s) = Le, (s) + Le, (s) 
E\UE) 


> Te(t) +Te,u) > | fan +f f du —2€ 
Ey E2 


= p(E1) + p(E2) — 2e. 


Since € is arbitrary, o(E; U £2) > p(E,) + p(E2). By induction, we obtain 
P(E, U---UE,) > p(E,) +--+: + p(E,) for every n, and thus p(E) > 
p(E,) +--++ e(E,) by another application of Proposition 5.15b. Therefore 
P(E) = ae p(E,,), and the reverse inequality has been proved. 


We give five corollaries that are consequences of Corollary 5.17 and Theorem 
5.19. The first three make use only of additivity, not of complete additivity. 


Corollary 5.20. If ,, f du is defined, then /,, Iz f dy is defined and equals 
Se f du. 

PROOF. It is sufficient to handle f* and f~ separately. Then both integrals are 
defined, and f, fdu = felefdut fpOdu = felefdut frelefdu = 
ty Inf dw. 
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Corollary 5.21. If /,, f dy is defined, then | [,, fdu| < J, \f\du. If f is 
integrable on E, so is | f|. 


Proor. Let E; = EM f~!((0,+00]) and E7 = EN f—!({—o, 0)). Then 
use of the triangle inequality gives 


[ef dul =| Se, ftdu- fy, f aul < fe ff dut fp, f- dp 
= fe, lfldut fe, |fldu = fe lflau. 


If f is integrable on E, both f;,, f* du and f,,, f~ dy are finite. Their sum is 
Se lfldu. 


Corollary 5.22. If f is a measurable function and uw(E AF) = 0, then 
J, f du = J, f du, provided one of the integrals exists. 


PROOF. Without loss of generality, we may assume that f > 0. Then both 
integrals are defined. Since E A F = (E — F) U(F — E), we have w(E — F) = 
u(F — E) = 0. Then Theorem 5.19 and Proposition 5.15a give f,, fdu = 
Ser f4M + fen $4 = OF fone FM = Spee fd + Senge fede = 
Sp f du. 


Corollary 5.23. If f is ameasurable function and ifthe set A = {x | f@aF 0} 
has w(A) = 0, then ty f du = 0. Conversely if f is measurable, is > 0, and has 
Sy f du =0, then A = {x| f(x) 40} has (A) = 0. 


REMARKS. When a set where some condition fails to hold has measure 0, one 
sometimes says that the condition holds almost everywhere, or a.e., or at almost 
every point. If there is any ambiguity about what measure is being referred 
to, one says “a.e. [dj].” Thus the conclusion in the converse half of the above 
proposition is that f is zero a.e. [dy]. 


PROOF. For the first statement, Corollary 5.20 gives [, fdu = fy laf du = 
J, fdu = 0. Conversely let A, = pe +oo]). This is a measurable 
set. Since f is > 0, A = US, An. Proposition 5.1g and complete additivity 
of uw give w(A) < 0, (Az). If w(An) > 0 for some n, then fy fdu = 
Wi fdut Sac fdu = pS tdu = + (An) > 0, and we obtain a contradiction. 
We conclude that w(A,) = 0 for all n and hence that w(A) = 0. 


Corollary 5.24. If f > 0 is an integrable function on X, then for any € > 0, 
there exists a 6 > O such that f,, f du < € for every measurable set E with 
L(E) <6. 
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PROOF. Let € > O be given. If N > 0 is an integer, then the sets Sy = 
1% ex | fa) =N } form a decreasing sequence whose intersection is $ = 
{x EX | f@me= +oo}. Since f is integrable, w(S) = 0 and therefore fe fdu= 
0. The finiteness of / y f du, together with Corollary 5.3 and the complete 
additivity of E> J, f du given in Theorem 5.19, implies that limy fy f du = 
0. Choose N large enough so that ie f du < €/2,and then choose 6 = €/(2N). 
If u(E) < 6, then 


te fdu= Isyne FIM + Sscng f ae 
< fs, fA + Soong Ndu < €/2+NuE) <€/2+€/2=€, 


and the proof is complete. 


In a number of the remaining results in the section, a sequence { f,} of mea- 
surable functions converges pointwise to a function f. Corollary 5.10 assures 
us that f is measurable. Suppose that /;, f, dj exists for each n. Is it true that 
J, f du exists, is it true that lim, /;, fnd exists, and if both exist, are they 
equal? Once again we encounter an interchange-of-limits problem, and there 
is no surprise from the general fact: all three answers can be “no” in particular 
cases. Examples of the failure of the limit of the integral to equal the integral of 
the limit are given below. After giving the examples, we shall discuss theorems 
that give “yes” answers under additional hypotheses. 


EXAMPLES. 
(1) Let X be the set of positive integers, let A consist of all subsets of X, and 
let 4 be counting measure. A measurable function f is a sequence {f(k)} with 


values in R*. Define a sequence { f,,} of measurable functions for n > | by taking 
fF.) 1/n ifk <n, 
oe) 10 ifk >n. 


Then fe indy = 1 forall n, lim f,, = 0 pointwise, and 


/ lim f, du < tim | fr di. 
xX xX 


(2) Let the measure space be X = R! with the Borel sets and Lebesgue measure 


m. Define 
n for0 <x <1/n, 


fn(x) = | 


Then the same phenomenon results, and everything of interest is taking place 
within [0, 1]. So the difficulty in the previous example does not result from the 
fact that X has infinite measure. 


0 otherwise. 
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Theorem 5.25 (Monotone Convergence Theorem). Let E be a measurable 
set, and suppose that { f,,} is a sequence of measurable functions that satisfy 


OS Silo) fi) sas Ss Fee se 


for all x. Put f(x) = lim, f,,(x), the limit being taken in R*. Then J p f du and 
lim, {, fn du both exist, and 


[ tan= tim f frau, 
E n—>Co E 


REMARKS. This theorem generalizes Corollary 1.14, which is the special case 
of the Monotone Convergence Theorem in which X is the set of positive integers, 
every subset is measurable, and jz is counting measure. In the general setting 
of the Monotone Convergence Theorem, one of the by-products of the theorem 
is that we obtain an easier way of dealing with the definition of /, pS de for 
f = 0. Instead of using the totality of simple functions between 0 and f, we 
may use a single increasing sequence with pointwise f , such as the one given by 
Proposition 5.11. The proof of Proposition 5.26 below will illustrate how we can 
take advantage of this fact. 


PROOF. Since f is the pointwise limit of measurable functions and is > 0, f 
is measurable and J, pf dp exists in R*. Since { f,(x)} is monotone increasing 
in n, the same is true of { [, f, du}. Therefore lim, /;, fr du exists in R*. Let 
us call this limit k. For eachn, [,, fadu < J, f du because f, < f. Therefore 
k < J, f du, and the problem is to prove the reverse inequality. 

Let c be any real number with 0 < c < 1, to be regarded as close to 1, and let 
s be a simple function withO <s < f. Define 


E, ={x €E| fax) = cs(w)}. 


These sets are measurable, and EL; C Ey C £3 C--- C E. Let us see that 
| (bey E,. If f(x) = 0 for a particular x in FE, then f,(x) = 0 for all n 
and also cs(x) = 0. Thus x is in every E,. If f(x) > 0, then the inequality 
tf (x) = s(x) forces f(x) > cs(x). Since f(x) has increasing limit f(x), fr(x) 
must be > cs(x) eventually, and then x is in E,,. In either case x is in Re ae E,. 
Thus FE =(J°°, En. 

For every n, we have 


k= | fanz | frdu> [ csdu=e | sd. 
E En i 


Since, by Theorem 5.19, the integral is a completely additive set function, Propo- 
sition 5.2 shows that lim te sdu = f,sdu. Therefore k > cf, sdu. Since 
c is arbitrary withO <c < 1,k > f 7, Sau. Taking the supremum over s with 
0 <s < f,we conclude thatk > f,, f du. 
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Proposition 5.26. If f and g are measurable functions, if their sum h = f+g 
is everywhere defined, and if /,, fdu + J, gdwu is defined, then [,,hdw is 


defined and 
frau= fran f ean. 
E E E 


REMARK. It may seem surprising that complete additivity plays a role in the 
proof of this proposition, since it apparently played no role in the linearity of the 
Riemann integral. In fact, although complete additivity is used when f and g 
are unbounded, it can be avoided when f and g are bounded, as will be observed 
in Problems 42-43 at the end of the chapter. The distinction between the two 
cases is that the pointwise convergence in Proposition 5.11 is actually uniform if 
the given function is bounded, whereas it cannot be uniform for an unbounded 
function because the uniform limit of bounded functions is bounded. 


PROOF. The sum / is measurable by Proposition 5.7. For the conclusions 
about integration, first assume that f > 0 and g > 0. In the case of simple 
functions s = t + u witht > 0 and u > 0, we use Proposition 5.14 and Lemma 
5.12. The proposition shows that we are to prove that Zz(s) = Ze(t) + Ze), 
and the lemma shows that we can use expansions of ft and u into sets on which 
t and u are both constant and the conclusion about Z,(s) is evident. If f and 
g are > 0 and are not necessarily simple, then we can use Proposition 5.11 to 
find increasing sequences {t,} and {u,} of simple functions > O with limits f 
and g. If s, = t, + un, then s, is nonnegative simple, and {s,} increases to 
h. For each n, we have just proved that [,sndu = fptdu+ fund, and 
therefore [-hdu = [,, f du+ J, g du by the Monotone Convergence Theorem 
(Theorem 5.25). 

The next case is that f > 0,g < 0,andh = f+g > 0. Then f = 
h + (—g) with h > 0 and (—g) > 0,so that f, fdu = f,hdut f,(-g) du. 
Hence/,hdu =f, fdu+ J, g du, provided the right side is defined. 

For a general h > 0, we decompose E into the disjoint union of three sets, 
one where f > 0 and g > 0, one where f > 0 and g < 0, and one where f < 0 
and g > 0. The additivity of the integral as a set function (Theorem 5.19), in 
combination with the cases that we have already proved, then gives the desired 
result. Finally for general h, we have only to write h = h* — h~ and consider 
ht and h7 separately. 


Corollary 5.27. Let E be a measurable set, and let {f,} be a sequence 
of measurable functions > 0. Put F(x) = O*, fn(x). Then f, Fdu = 


ae Je fn dy. 


ProoF. Apply Proposition 5.26 to the n™ partial sum of the series, and then 
use the Monotone Convergence Theorem (Theorem 5.25). 
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The next corollary is given partly to illustrate a standard technique for passing 
from integration results about indicator functions to integration results about 
general functions. This technique is used again and again in measure theory. 


Corollary 5.28. If f > 0 is a measurable function and if v is the measure 
v(E) = f, f du, then f,, gdv = f,, gf du for every measurable function g for 
which at least one side is defined. 


REMARKS. The set function v is a measure by Theorem 5.19. In the situation 
of this corollary, we shall write v = f dy. 


PROOF. By Corollary 5.20 it is enough to prove that 


[save f eran. (x) 
x XxX 


For g = Ig, («) is true by hypothesis. Proposition 5.26 shows that (*) extends to 
be valid for simple functions g > 0. For general g > 0, Proposition 5.11 produces 
an increasing sequence {s,,} of simple functions > 0 with pointwise limit g. Then 
(*) for this g follows from the result for simple functions in combination with 
monotone convergence. For general g, write g = g* — g~, apply (*) for gt and 
g_,and subtract the results using Proposition 5.26. 


Theorem 5.29 (Fatou’s Lemma). If E is a measurable set and if {f,} is a 
sequence of nonnegative measurable functions, then 


[imine fa < timint tnd. 
BE. Tt nt E 


In particular, if f(x) = lim, f,(x) exists for all x, then 


[ fens timine f Sn de. 
E It E 


REMARK. Fatou’s Lemma applies to both examples that precede the Monotone 
Convergence Theorem (Theorem 5.25), and strict inequality holds in both cases. 


PROOF. Set g,(x) = infks, f,(x). Then lim, g,(x) = liminf f,(x), and the 
Monotone Convergence Theorem (Theorem 5.25) gives 


[imine frau = f times du = tim fg. au, 
Be i EE” n JE 
But gn(x) < fn(x) pointwise, so that f, gndu < fi, frdu for all n. Thus 


tim | end < timint | fa did, 
E E 


and the theorem follows. 
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Theorem 5.30 (Dominated Convergence Theorem). Let E be a measurable 
set, and suppose that { f,,} is a sequence of measurable functions such that for 
some integrable g, |f,| < g for alln. If f = lim f, exists pointwise, then 
lim, f p Indy exists, f is integrable on FE, and 


[tau =m f fedu. 


PROOF. The set on which g is infinite has measure 0, since g is integrable. If 
we redefine g, f,,, and f to be 0 on this set, we change no integrals and we affect 
the validity of neither the hypotheses nor the conclusion. 

By Corollary 5.18, f is integrable on E, and so is f,, for every n. Applying 
Fatou’s Lemma (Theorem 5.29) to f, + g => 0, we obtain (Gs + g)du < 
lim inf f, (fn +8) du. Since g is integrable and everywhere finite, this inequality 


becomes 
/ fdus< tim int [ tn de. 
E E 


A second application of Fatou’s Lemma, this time to g — f, > 0, gives 
Ji(g — f)du < liminf f7.(g — fr) dw. Thus 


-| fdu <timint f (fy dy 

E E 

and / fdu = lim sup [ tnd. 
E E 


Therefore lim p Jn dy exists and has the value asserted. 
Corollary 5.31. Let E be a set of finite measure, let c > 0 be in R, and suppose 


that { f,,} is a sequence of measurable functions such that | f,| < c for all n. If 
f =lim f, exists pointwise, then lim / rp Jn du exists, f is integrable on E, and 


[ tan =iim f fran, 


PROOF. This is the special case g = c in Theorem 5.30. 


5. Proof of the Extension Theorem 


In this section we shall prove the Extension Theorem, Theorem 5.5. After the 
end of the proof, we shall fill in one further detail left from Section 1—to show 
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that a measure on a o-ring has a canonical extension to a measure on the smallest 
o-algebra containing the given o-ring. 

Most of this section will concern the proof of the Extension Theorem in the case 
that X is measurable and v(X) is finite. Thus, until further notice, let us assume 
that X is a nonempty set, A is an algebra of subsets of X, and v is a nonnegative 
completely additive set function defined on A such that v(X) < +00. 

In a way, the intuition for the proof is typical of that for many existence- 
uniqueness theorems in mathematics: to see how to prove existence, we assume 
existence and uniqueness outright, see what necessary conditions each of the 
assumptions puts on the object to be constructed, and then begin the proof. 

With the present theorem in the case that v(X) is finite, we shall assign to each 
subset E of X an upper bound p*(£) and a lower bound p..(£) for the value of 
the extended measure on the set FE. If the existence half of the theorem is valid, 
we must have w,(E) < u*(E) for E in the smallest o-algebra containing A. In 
fact, we shall see that this inequality holds for all subsets E of X. On the other 
hand, if w,.(E) < u*(E) for some F in the o-algebra of interest and if our upper 
and lower bounds are good estimates, we might expect that there is more than one 
way to define the extended measure on E, in contradiction to uniqueness. That 
thought suggests trying to prove that w,.(£) = u*(£) for the sets of interest. One 
way of doing so is to try to prove that the class of subsets for which this equality 
holds is a o-algebra containing A, and then the common value of jz, and j2* is 
the desired extension. 

This procedure in fact works, and the only subtlety is in the definitions of 
L,(E) and u*(E). We give these definitions after one preliminary lemma that 
will make ju, and y* well defined. For orientation, think of the setting as the 
unit interval [0, 1], with Lebesgue measure to be extended from the elementary 
sets to the Borel sets. In this case the families 7/ and KX in the first lemma contain 
all the open sets and all the compact sets, respectively, and may be regarded as 
generalizations of these collections of sets. 


Lemma 5.32. Let U/ be the class of all countable unions of sets in A, and let 
K be the class of all countable intersections of sets in. A. Then w* and yu, are 
consistently defined on U/ and K, respectively, by letting 


u*(U) = lim v(A,) and Ly(K) = lim v(C,,) 


whenever {A,} is an increasing sequence of sets in A with union U and {C,} is a 
decreasing sequence of sets in A with intersection K . Moreover, jz* and 1, have 
the following properties: 


(a) «* and jw, agree with v on sets of A, 
(b) w*(U) < u*(V) whenever U isinU, V isinU,andU CV, 
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(c) u,(K) < u,(L) whenever K isink,LisinK,and K CL, 
(d) lim w*(U,,) = u*(U) whenever {U,,} is an increasing sequence of sets in 
U with union U. 


Proof. If {B,} is another increasing sequence in A with union U, then 
Proposition 5.2 and Theorem 1.13 give 


lim v(Am) = lim (lim v(Am A Bn)) = lim (lim v(Am M Bn)) = lim v(Bn). 
Hence j* is consistently defined on U/. Similarly if {D,} decreases to K, then 
Corollary 5.3 and Theorem 1.13 give 

v(X) — lim V(Cm) = V(X) - lim (lim vV(Cn A Dn)) 

= p(X) — lim (lim v(Cm M Dn)) = V(X) — lim v(Dn), 


and hence lim,, v(C,) = lim, v(D,). Thus jz, is consistently defined on K’. The 
set functions w* and yj, are defined on all of U/ and K because a set that is a 
countable union (or intersection) of sets in an algebra is a countable increasing 
union (or decreasing intersection). 

Of the four properties, (a) is clear, and (b) and (c) follow from the inequalities 


w*(U)= sup v(A)< sup v(A)=p*(V) 


ACU, AcA ACV, ACA 
d K)= inf A) < inf A) = p,(L). 
an by (K) “orien: NS gs ) = py (L) 


In (d), U is in U, since the countable union of countable unions is again a 
countable union, and (b) shows that lim u*(U,,) < u*(U). For each n, let {AP} 
be an increasing sequence of sets from A with union U,,. Arrange all the A” in 
a sequence, and let B, denote the union of the first k members of the sequence. 
Then {B;} is an increasing sequence with union U. Let € > 0 be given, and 
choose M large enough so that u*(By) > w*(U) —e. Since the sets U,, increase, 
since By is a finite union of sets Az ) , and since AS ) Cc U,, we must have 
u*(Uy) = w*(By) for some N. But then 


lim w*(U,) = w*(Un) = w*(Bu) = w*(U) — €. 


Since € is arbitrary, lim *(U,) > uw*(U). 


For each subset E of X, we define 
wW(E)= inf w*(U) and ps(E)= sup ps(K). 
UDE,UcU KCE, KEK 


Conclusions (b) and (c) of Lemma 5.32 show that the new definitions of jz* and 
are consistent with the old ones. The set functions j* and jz, on arbitrary subsets 
E of X may be called the outer measure and the inner measure associated to v. 
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Lemma 5.33. If A and B are subsets of X with A C B, then p*(A) < u*(B) 
and wt,(A) < u,(B). In addition, 


(a) if E CUS, En, then w*(E) < Te, u(En), 
(b) if F and G are disjoint, then w,(F) + Us(G) < usx(F UG). 


PROOF. Since j4*(A) is an infimum over a larger class of sets than jz*(B) is, 
we have *(A) < u*(B). Similarly w,.(A) < u.(B). 

For (a), let E C i ae E,,. In the special case in which E,, is in U for all n, 
let {F,””} be, for fixed n and varying m, an eee oe sequence of sets in A with 
union £,,. For any N, we then have ieee (EU -U FO) = E,U---UEn. 
Hence 


ur(E) <p (Ue) = lim 2* (Us) by Lemma 5.32d 


n=1 
(oe) 
1 
ape (Cour 
— = lim lim v(F U-+-U F®)) by definition of u* on U 
< lim lim Se v(F®) by Proposition 5.1f 


N oo 
= lim D1 a (En) = D7 En). 
n=1 n=1 


For general subsets E,, of X, choose U, in U with U, D E, and uw*(Un) < 
u*(En) + €/2". Then E C U,, Un, and the special case applied to the U,, shows 
that 


w(E) < u*((JUn) < Do wn) S Do ut (En) $€- 


Hence w*(E) < >>, u* (En), and (a) is proved. 

For (b), let F and G be disjoint. In the special case in which F and G are in 
K, let {F,,} and {G,,} be decreasing sequences of sets in A with intersections F 
and G. Then 

x(F UG) = lim v(F, U Gy) by definition of jz, on K 
= lim (v(F,) + v(G,) — v(FnGn)) by Proposition 5.1b 
= U,(F) + Us (G), 


the last step holding by Corollary 5.3, since F 1G is empty. For general disjoint 


5. Proof of the Extension Theorem 257 


subsets F and G in X, choose K and L ink with K C F,L CG, p,(K) > 
Ms(F) — €,and ,(L) > u.(G) — €. Then 


px(F UG) > wxCK UL) = py (K) + pe (L) = bs (F) + Us (G) — 2€, 


the middle step holding by the special case. Hence ,(F UG) > w.(F)+us(G), 
and (b) is proved. 


Lemma 5.34. For every subset FE of X, 4.(E) < u*(E). Equality holds if E 
is inU/or K. 


PROOF. The proof is in three steps. 

First we prove that if U is in U/ and K is in K, then w*(U) < w,(U) and 
u*(K) < w(K). In fact, choose C in A with C C U and w*(U) < v(C) + €. 
Then “*(U) < v(C) +€ < uw,(U) + € by Lemma 5.33 since C C U. Hence 
u*(U) < pw.(U). Similarly choose D in.A with D > K and w,(K) => v(D)—e. 
Then p,(K) > v(D) —€ > u*(K) — €, and hence p,(K) > u*(K). 

Second we prove that if K is in K, then u*(K) = p,(K). In fact, choose C 
in A with C D K and v(C) — p.,.(K) < €. Then C — K is inU, and 


L,(K) < v(C) < w(C — K) + pu (K) by Lemma 5.33a 
< (us(C — K) + x(K)) — Wx(K) + u*(K) _ by the previous step 
< v(C) — p,(K) + w*(K) by Lemma 5.33b 
< p*(K) +e by the choice of C. 


Combining this inequality with the previous step, we see that w*(K) = w(K). 

Third we prove that w,(E) < u*(E) for every E. In fact, find K in K and U 
inUwithh K CE CU, p,(K) > pw,(E) — €, and u*(U) < w*(E) + €. Then 
Ly(E) < uy.(K) +e = w*(K) +€ < w*(U) +€ < w*(E) 4+ 2€, and the proof 
is complete. 


Define a subset E of X to be measurable for purposes of this section if 
Ly(E) = u*(E), and let B be the class of measurable subsets of X. Lemma 5.34 
shows that U/ and K are both contained in B. 


Lemma 5.35. If U is inUand K is in K with K C U, then 
u*(U — K) = w*(U) — w(K). 
If FE is measurable, then for any « > 0, there are sets K in K and U inU with 


KCECU and 
priE— K) <u" —K) xe. 
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PROOF. For the first conclusion, U — K is in U/ and hence uw*(U — K) = 
byx(U — K) = px(U) — w(K) = w*(U) — bx(K) by Lemma 5.34, Lemma 
5.33b, and Lemma 5.34 again. 

For the second conclusion choose K in K and U inU with K C E CU, 
Mx(K) +5 > wx(E), and w*(E) => w*(U) — 5. Since p.(E) = u*(E) by 
the assumed measurability, we see that w,(K) + = > uw*(U) — 58 hence that 
u*(U) — px(K) < €. The result now follows from Lemma 5.33 and the first 
conclusion of the present lemma. 


Lemma 5.36. The class 6 of measurable sets is a o-algebra containing A, and 
the restriction of jz* to B is a measure. 


PROOF. Certainly 6 > A. The rest of the proof is in three steps. 

First we prove that the intersection of two measurable sets is measurable. In 
fact, let F and G be in 6, and use Lemma 5.35 to choose K C F and L C G with 
pb (F—K) < €andp*(G—L) < e. Since FANG C (F—K)U(KNL)U(G—-L), 


(FG) 
<wW(F-K)+w(KNL)+u*(G—L)_ by Lemma 5.33a 
< w(K NL) +2e by definition of K and L 
= U,(K ML) + 2€ by Lemma 5.34 
< px(F MG) + 2e since KNOL CFG. 


Second we prove that the complement of a measurable set is measurable. Let 
E be measurable. By Lemma 5.35 choose K in kK and U inUwith K CE CU 
and u*(U — K) <e. Since US C ES C K“ and K° —US = U — K, we have 
(ES) < w* (KS —U) + u*(U5) by Lemma 5.33a 
=p (U —K)+pu,(U‘) since U“ is in Ke 
Se +p,(E*). 
Thus the complement of a measurable set is measurable, and G is an algebra of 
sets. 
Third we prove that the countable disjoint union of measurable sets is measur- 


able, and z* is a measure on B. In fact, let {E,,} be a sequence of disjoint sets in 
B. Application of Lemma 5.33a, Lemma 5.33b, and Lemma 5.34 gives 


lee) oo oo N 
(J En) So etn) = Yo oa (En) = lim Ya (En) 
n=1 n=! n=1 n=1 


(ee) 


En) < s((JE») <u"(L Bs). 


n=1 n=1 n=1 


C= 


< lim a ( 
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The end members of this chain of inequalities are equal, and thus equality must 


hold throughout: w.(U,, En) = u*(U,, En) = 2 u*(E,). Consequently U,, En 
is measurable, and jz* is completely additive. 


PROOF OF THEOREM 5.5 UNDER THE SPECIAL HYPOTHESES. We continue to 
assume that the given ring of subsets of X is an algebra and that v(X) is finite. 
Define B to be the class of measurable sets in the previous construction. Then 
Lemma 5.36 shows that B is a o-algebra containing A. Hence B contains the 
smallest o-algebra C containing A. Lemma 5.36 shows also that the restriction 
of 4z* to C is a measure extending v. This proves existence of the extension under 
the special hypotheses. 

For uniqueness, suppose that yz’ is an extension of v to C. Proposition 5.2 
and Corollary 5.3 show that jz’ has to agree with * on U and with pw, on K. If 
K CE CU with K inK and U inJU, then we have 


Mx(K) = w(K) < w(E) < w'(U) = w*(U). 


Taking the supremum over K and the infimum over U gives w,(E) < w’/(E) < 
u*(E). Since E is in B, w,(E) = u*(E), and we see that w/(E) = p*(E). 
Thus ju’ coincides with the restriction of j* to C. This proves uniqueness of the 
extension under the special hypotheses. 


Now we return to the general hypotheses of Theorem 5.5—that 7 is a ring of 
subsets of X, that v is a nonnegative completely additive set function on R, and 
that v is o-finite—and we shall complete the proof that v extends uniquely to a 
measure on the smallest o-ring C containing R. 


PROOF OF THEOREM 5.5 IN THE GENERAL CASE. If S is an element of R with 
v(S) finite, define SOR = {s OR | Re R}. Then (S, SOR, ee is a set 
of data satisfying the special hypotheses of the Extension Theorem considered 
above. By the special case, if Cs denotes the smallest o-algebra of subsets of S 
containing SNR, then v| Gam has a unique extension to a measure jus on Cs. The 
measures js have a certain consistency property because of the uniqueness: if 
S’ Cc S, then ws siaR = bes’: 

Now let {S,,} be a sequence of sets in R with union S in C and with v(S,,) 
finite for all n. Possibly replacing each set S, by the difference of S,, and all 
previous S;’s, we may assume that the sequence is disjoint. We define zs on 
the o-algebra S.C of subsets S by ws(E) = Yo, ws, (EO Sy) for E in SNC. 
Let us check that zs is unambiguously defined and is completely additive. If 
{Tin} is another sequence of sets in ® with union S and with v(T,,) finite for 
all m, then the corresponding definition of a set function on SNC is w4(E) = 
Yon LT, (EO Tm). The consistency property from the previous paragraph gives 
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us Ls (EN Sy OT) = Mr, (EAS, 0 T,,). Then Corollary 1.15 allows us to 
write 


Hs(E) = D0 wr, (EN Tn) =) YM (E Sn 0 Tn) 


=) PS ENSA T=) >: BsdE 1S, OT) 


=o us,(EM Sp) = ws(E), 


and we see that fs is unambiguously defined. To check that ws is completely 
additive, let F,, F2,... be a disjoint sequence of sets in SMC with union F. Then 
the complete additivity of 15, , in combination with Corollary 1.15, gives 


ts(F) = >) us, (FOS) = >) ds, (Fm O Sn) 


= S°S— us, Fin 0 Sn) = Shs Fn), 


and thus js is completely additive. 

The measures jus are consistent on their common domains. To see the consis- 
tency, let us see that zs and (zr agree on subsets of SM 7. Let S be the countable 
disjoint union of sets S, in R, and let T be the countable disjoint union of sets T,, 
in R. Then $M T is the countable disjoint union of the sets S$, Tm. If E is in 
(SAT) NMC, then Corollary 1.15 and the consistency property of the set functions 
Ltr for R in R yield 


us(E) = >> ws,(EN Sn») = >> ws, (EO Sn NT) 
=) EOS 0) = > tere EOS OT) 
= 0S usnm (EO Sn Tm) = >> >) tr, (EO Sp Tn) 


= Do ot, (ENS AT) =D) M1, (EO Tn) = Wr (E). 


Hence the measures js are consistent on their common domains. 

If M denotes the set of subsets of X that are contained in a countable union 
of members of 7 on which v is finite, then M is closed under countable unions 
and differences and is thus a o-ring containing ?. It therefore contains C, and we 
conclude that every member of C is contained in a countable union of members of 
R on which v is finite. It follows that we can define yz on all of C as follows: if E 
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is in C, then £ is contained in some countable union S of members of 7 on which 
v is finite, and we define w(E) = ws(E). We have seen that the measures ws 
are consistently defined, and hence jz(£) is well defined. If a countable disjoint 
union E = (Jy_, En of sets in C is given, then all the sets in question lie in a 
single S, and we then have u(E) = ws(E) = 0°, ws (En) = O°, w(E,). In 
other words, jz is completely additive. This proves existence. 

For uniqueness let E be given in C, and suppose that S is a member of C 
containing F and equal to the countable disjoint union of sets S, in R with v(S,) 
finite for all n. We have seen that the value of w(E M S,) = ws, (EM Sy) is 
determined by v| 5,AR? hence by v on R. By complete additivity of w, u(E) is 
determined by the values of w(E 1 S,,) for alln. Therefore jz on C is determined 
by v on R. This proves uniqueness. 


As was promised, we shall now fill in one further detail left from Section 1 —to 
show that a measure on a o-ring has a canonical extension to a measure on the 
smallest o-algebra containing the given o-ring. 


Proposition 5.37. Let # be a o-ring of subsets of a nonempty set X, let R, 
be the set of complements in X of the members of R, and let A be the smallest 
o-algebra containing ®. Then either 


Gi) R=R. = Aor 
(ii) RNR, = Gand A= RUR,. 


In the latter case any measure jz on 7 has a canonical extension to a measure 
i, on A given by w1(E) = sup {W(F) | F é€RandF C E} for E in Re. 
This canonical extension has the property that any other extension j12 satisfies 
M2 = [A1. 


PROOF. If X is in R, then 7 is closed under complements, since 7? is closed 
under differences; hence R = R, = A. If X is not in R, thn RNR, = @ 
because any set F in the intersection has E° in the intersection and then also 
X = EU E* in the intersection. In this latter case it is plain that A > RU Re. 
Thus (ii) will be the only alternative to (i) if it is proved that B = RU R, is 
a o-algebra. Certainly B is closed under complements. To see that B is closed 
under countable unions, we may assume, because FR is a o-ring, that we are to 
check the union of countably many sets with at least one in R,. Thus let {E,,} 
be a sequence of sets in R, and let {F;,,} be a sequence of sets in R,. Then 
E =, En isin R and F = ()\~, Fé is in R, since R is a o-ring. The union 
of the sets E, and F,, in question is E U F° = (F — E)‘, is exhibited as the 
complement of the difference of two sets in R, and is therefore in R,. Thus A is 
closed under countable unions and is a o-algebra. 
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In the case of (ii), let us see that jz; is a measure on A. If we are to check 
the measure of a disjoint sequence of sets in A, there is no problem if all the sets 
are in R, since [1 le = p is completely additive. There cannot be as many as 
two of the sets in R, because no two sets F; and F2 in R, are disjoint; in fact, 
F, 0 Fy = (Fy U F5)° exhibits the intersection as in R,, and the empty set is 
not a member of R,. Thus we may assume that the disjoint sequence consists 
of a sequence {E,,} of sets in and a single set F in R,. If EF = ese En, 
then 1(E) = W(E) = 2°, wW(En) = 2°, 1 (En). So it is enough to see 
that uj)(E UF) = w(E)+4+ u\(F). If E’ is a subset of F that is in R, then 
Mi(EU FP) > w(EVE') = u(E) + “u(E£’). Taking the supremum over all such 
E’ shows that 4,(E U F) > w(E) + 1 (F). For the reverse inequality let S be 
a member of R contained in E U F.. Then the sets EM S and FNS = S — F° 
are in R, and thus w(S) = WEN S)+ uF OS) < w(E) + 1 (F). Taking the 
supremum over S gives 4,(E U F) < w(E) + wy(F). Thus p; is completely 
additive. 

If j22 is any other extension, any set F in R* has w2(F) > W2(E) = w(E) forall 
subsets E of F that are in. Taking the supremum over E gives j12(F’) > wi (F), 
and thus 42 > j1; as set functions on A. 


6. Completion of a Measure Space 


If (X, A, 2) is a measure space, we define the completion of this space to be the 
measure space (X, A, 72) defined by 


A= {ez 


E isin A and Z C Z’ for 
some Z’ € Awith w(Z’) =0 J’ 


H(E AZ) = UE). 


It is necessary to verify that the result is in fact a measure space, and we shall 
carry out this step in the proposition below. In the case of Lebesgue measure m 
on the line, when initially defined on the o-algebra A of Borel sets, the sets in 
o-algebra A are said to be Lebesgue measurable. 


Proposition 5.38. If (X,.A,) is a measure space, then the completion 
(X, A, 72) is a measure space. Specifically 
(a) Ais ac-algebra containing A, 
(b) the set function Z is unambiguously defined on A, ie., if Ej; AZ) = 
E> A Z> as above, then w(E,) = W(E2), 
(c) 7 is a measure on A, and @(E) = p(E) for all sets E in A. 
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In addition, 
(d) if @ is any measure on A such that fi(E) = w(E) for all E in A, then 
wh=ponds, 
(e) if 4(X) < +00 and if for E C X, w,(E) and u*(£) are defined by 


fx(E)= sup p(A) and p*(E)= inf (A), 
ACE, Ac A ADE, Ac A 


then E is in A if and only if 4,.(E) = w*(E). 


PROOF. For (a), certainly A C A because we can use Z = Z’ = @ in the 
definition of A. Since (E AZ) = (EAZ)AX =(EAX)AZ=ECAZ,A 
is closed under complements. 

To prove closure under countable unions, let us first prove that 


A=|euz (*) 


E isin A and Z C Z’ for 
some Z’ € Awith uw(Z’) = 0] * 


Thus let E U Z be given, with Z C Z’. Then EUZ = EA(ZA(ENQZ)) with 
ZA(ENZ) CZ’. SoEUZisin A. Conversely if E A Z is in A, we can write 
EAZ = (E-Z')\U(ENZ')—Z)U(Z—E)) with (ENZ')—Z)U(Z—E)) C Z’, 
and then we see that E A Z is of the form E” U Z” with E” in A and Z” C Z’. 

Returning to the proof of closure under countable unions, let E,, U Z, be given 
in A with Z,  Zj, and 4(Z’,) = 0. Then L), (En U Zn) = (U,, En) U (Uy Zn) 
with U,, Zn S U,, Z), and (U,, Z},) = 0. In view of («), A is therefore closed 
under countable unions. 

For (b), we take as given that E; A Z; = E2 A Z2 with Z; C Z),Z2 C Z5,and 
u(Z}) = w(Z5) = 0. Then (E; A Ez) A(Z; A Zz) = @ and hence E; A E2 = 
Z| A Zz © Z\ UZ. Therefore w(E|—E2) < w(E; A Ez) < w(Z,UZ5) = Oand 
similarly (E2 — E)) = 0. It follows that w(41) = wE1 — Eo) + u(E1N Eo) = 
MCE, Eo) = W(E2 — Ey) + w(E, ON Er) = w(E2), and 7 is unambiguously 
defined. 

For (c), we see from (x) that @ can be defined equivalently by Z(EUZ) = w(E) 
if Z C Z' and u(Z') = 0. If a disjoint sequence E,, U Z, is given, then we find 
that 7(U),,(En U Zn) = B((Uy En) U (Uy Zn)) = MCU En) = (En) = 
>> M(E, UZ,), and complete additivity is proved. Taking Z = © in the definition 
LCE U Z) = w(E), we obtain “(E) = w(E) for E in A. 

For (d), we use (*) as the description of the sets in A. Let E U Z be in A 
with E in A, Z C Z’, and Z’ in A with w(Z’) = 0. Then Proposition 5.1f gives 
W(E NZ) < w(Z) < BZ’) = w(Z’) = 0, so that W(E NZ) = p(Z) = 0. 
Meanwhile, Proposition 5.1b gives #(E U Z) + P(E N Z) = W(E) + LZ). 
Hence (EU Z) = W(E) = w(E). 
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For (e), it is immediate that w,(E) < u*(E) for every subset E of X. Let 
E=CUZ bein A withC in A, Z C Z’, and Z’ in A with w(Z’) = 0. Then 
W(C) < ws(E) < w*(E) Ss MWC UZ’) < W(C) + HZ’) = WC). Since the 
expressions at the ends are equal, we must have equality throughout, and therefore 
My (E) = w*(E). 

In the converse direction let w,(E) = u*(E). We can find a sequence of sets 
A, € Acontained in E with lim w(A,) = u,(E), and we may assume without 
loss of generality that {A,} is an increasing sequence. Similarly we can find a 
decreasing sequence of sets B, € A containing E with lim w(B,) = u*(E). Let 
A = lL, Anand B = (),, Bn. When combined with the equality .(E) = u*(E), 
Proposition 5.2 and Corollary 5.3 show that w(A) = wx.(E) = w*(E) = w(B). 
Since A C E C B,wehave w(B—A) = w(B)—p(A) = Oand E = AU(E—A) 
with E — AC B— Aand w(B — A) = 0. By (*), E is in A. 


A variant of Proposition 5.38e and its proof identifies the o-algebra on which 
the extended measure is constructed in the proof of the Extension Theorem (The- 
orem 5.5) in the special case we considered. In the special case of the Extension 
Theorem, the given ring of sets is an algebra A, and v(X) is finite. The set 
function v gets extended to a measure jz on ao-algebra B that contains the smallest 
o-algebra C containing A. The sets of B are those for which w,(E) = u*(E), 
where 


wW(E)= inf pw*(U) and Mx(E)= sup px(K), 
UDE,UcUu KCE, KEK 


K and U having been defined in terms of countable intersections and countable 
unions, respectively, from A. The variant of Proposition 5.38e is that a subset 
E of X has w.(E) = w*(E) if and only if E is of the form C U Z with C in C, 
Z C Z’,and Z’ inC with 4(Z’) = 0. In other words, (X, B, x) is the completion 
of (X, C, w). 

The proof is modeled on the proof of Proposition 5.38e. If E = C U Z is 
a set in C with C inC, Z C Z’, and Z’ in C with w(Z’) = 0, then p(C) < 
L(E) < w*(E) < w(C UZ’) < w(C) + u(Z’) = w(C). We conclude that 
Ux(E) = w*(E). 

In the converse direction let w,(E) = u*(E). We can find an increasing 
sequence of sets A, € K C C contained in E with lim w(A,) = u,(E), and 
we can find a decreasing sequence of sets B, € U C C containing E with 
lim u(B,) = u*(E). Let A = U,, A, and B = (), B,. Arguing as in the proof 
of Proposition 5.38e, we have w(A) = w.(E) = w*(E) = w(B), w(B — A) = 
(B)— (A) = 0,and E = AU(E—A) with E—A C B—Aand u(B-—A) =0. 
Thus EF =CUZwithC = AandZ=E-—A. 

This calculation has the following interesting consequence. 
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Proposition 5.39. In R!, the Lebesgue measurable sets of measure 0 are 
exactly the subsets E of R! with the following property: for any € > 0, the set E 
can be covered by countably many intervals of total length less than e€. 


PROOF. Within a bounded interval [a, b], the above remarks apply and show 
that the Lebesgue measurable sets of measure 0 are the sets E with uw*(E) = 0, 
where “*(E) = infyse,veym*(U). The sets U defining *(E) are countable 
unions of intervals, and the proposition follows for subsets of any bounded interval 
[a, b]. 

For general sets E in R!, if the covering condition holds, then Proposition 5.1g 
shows that E has Lebesgue measure 0. Conversely if E is Lebesgue measurable 
of measure 0, then EM[—N, N] is a bounded set of measure 0 and can be covered 
by countably many intervals of arbitrarily small total length. Let us arrange that 
the total length is < 2~"e. Taking the union of these sets of intervals as N varies, 
we obtain a cover of E by countably many intervals of total length less than e. 
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Fubini’s Theorem for the Lebesgue integral concerns the interchange of order 
of integration of functions of two variables, just as with the Riemann integral 
in Section III.9. In the case of Euclidean space R”, we could have constructed 
Lebesgue measure in each dimension by a procedure similar to the one we used 
for R!. Then Fubini’s Theorem relates integration of a function of m+n variables 
over a Set by either integrating in all variables at once or integrating in the first 
m variables first or integrating in the last 7 variables first. In the context of more 
general measure spaces, we need to develop the notion of the product of two 
measure spaces. This corresponds to knowing IR” and R” with their Lebesgue 
measures and to constructing R”*” with its Lebesgue measure. 

In the theorem as we shall state it, we are given two measures spaces (X, A, ju) 
and (Y, 6, v), and we assume that both jz and v are o-finite. We shall construct a 
product measure space(X x Y,.A x B, wu x v), and the formula in question will 


be 
/ fd(ux nef [| f(x, y)dv(y)] due) 
XxY x Y 


= [Lf Fe.» aue] avon. 


This formula will be valid for f > 0 measurable with respect to A x B. 

The technique of proof will be the standard one indicated in connection with 
proving Corollary 5.28. We start with indicator functions, extend the result to 
simple functions by linearity, and pass to the limit by the Monotone Convergence 
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Theorem (Theorem 5.25). It is then apparent that the difficult step is the case that 
f is an indicator function. In fact, it is not even clear in this special case that the 
inside integral iE y /e(x, y)dv(y) is a measurable function of X, and this is the 
step that requires some work. 

We begin by describing A x B, the o-algebra of measurable sets for the product 
X x Y. Recall from Section A1 of the appendix that X x Y is defined as a set of 
ordered pairs. If A C X and B C Y, then the set of ordered pairs that constitute 
A x B isasubset of X x Y, and we call A x B arectangle® in X x Y. The sets 
A and B are called the sides of the rectangle. 


Proposition 5.40. If A and B are algebras of subsets of nonempty sets X and 
Y, then the class C of all finite disjoint unions of rectangles A x B with A in A 
and B in Bis an algebra of sets in X x Y. In particular, a finite union of rectangles 
is a finite disjoint union. 


PROOF. The intersection of the rectangles R; = A, x B, and Rp = A> x Bp is 
the rectangle R = (A, M Az) x (B, NM Bz) because both R; M R2 and R coincide 
with the set {(x, yyEXxY | x€Ai,x€ Ar, VERB, y By}. Therefore 


(Ua: x Bd) a (UG x Bp) =U {ainey x Bn Dp}. 
i=l j=l ; 


and the right side is a disjoint union if both L; (Ai x B;) and U; (Cj x Dj) are 
disjoint unions. Moreover, the right side is in C if both unions on the left are in C. 
Therefore C is closed under finite intersections. 

Certainly @ and X x Y are in C. The identity 


(X x Y)— (A x B) = ((X — A) x B) U(X x (Y - B)) 


exhibits the complement of a rectangle as a disjoint union of rectangles. Since 
the complement of a disjoint union is the intersection of the complements, C is 
closed under complementation. Thus C is an algebra of sets, and the proof is 
complete. 


If A and B are o-algebras in X and Y, then we denote the smallest o-algebra 
containing the algebra C of the above proposition by A x B. The set X x Y, 
together with the o-algebra A x B, is called a product space. The measurable 
sets of X x Y are the sets of A x B. 


©The word “rectangle” was used with a different meaning in Chapter III, but there will be no 
possibility of confusion for now. Starting in Chapter VI, both kinds of rectangles will be in play; 
the ones in Chapter III can then be called “geometric rectangles” and the present ones can be called 
“abstract rectangles.” 
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Let E be any set in X x Y. The section E, of E determined by x in X is 
defined by 


E, = {y| (@, y) isin E}. 


Similarly the section E” determined by y in Y is 
BP = (xa, ys £4, 
The section E, is a subset of Y, and the section E” is a subset of X. 


Lemma 5.41. Let {E,} be a class of subsets of X x Y, and let x be a point of 
X. Then 


(a) CW oe <= Ly (Ea)x> 
(6) (CyB = iletbelea 
(c) (Ey — Eg) = (Ea)x — (Eg)x and, in particular, (Ep) x = Y — (Eg)x. 
PROOF. These facts are special cases of the identities at the end of Section Al 
of the appendix for inverse images of functions. In this case the function in 
question is given by f(y) = (x, y). 


Proposition 5.42. Let A and B be o-algebras in X and Y, and let E bea 
measurable set in X x Y. Then every section E, is a measurable set in Y, and 
every section E” is a measurable set in X. 


PROOF. We prove the result for sections E,., the proof for E” being completely 
analogous. Let € be the class of all subsets E of X x Y all of whose sections Ey 
are in B. Then € contains all rectangles with measurable sides, since a section 
of a rectangle is either the empty set or one of the sides. By Lemma 5 41a, € 
is closed under finite unions. Hence € contains the algebra C of finite disjoint 
unions of rectangles with measurable sides. By parts (a) and (c) of Lemma 5.41, 
E is closed under countable unions and complements. It is therefore a o-algebra 
containing C and thus contains A x B. 


A corollary of Proposition 5.42 is that a rectangle in X x Y is measurable if 
and only if its sides are measurable. The sufficiency follows from the fact that 
a rectangle with measurable sides is in C, and the necessity follows from the 
proposition. 

From now on, we shall adhere to the convention that a rectangle is always 
assumed to be measurable. 

We turn to the implementation of the sketch of proof of Fubini’s Theorem 
given earlier in this section. The basic question will be the equality of the iterated 
integrals in either order when the integrand is an indicator function. If E is 
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a measurable set in X x Y, then we know from Proposition 5.42 that EF, is a 
measurable subset of Y. In order to form the iterated integral 


/ | / Ie(x, y)dv(y) | dx), 
xX Y 


we compute the inside integral as v(E,,), and we have to be able to form the 
outside integral, which is = v(E,) du(x). That is, we need to know that v(E,) 
is a measurable function on X. For the iterated integral in the other order, we 
need to know that (E”) is measurable on Y. 

The proof of this measurability is the hard step, since the class of sets EF for 
which v(E,) and 4(E”) are both measurable does not appear to be necessarily 
a o-algebra, even when py and v are finite measures. To deal with this difficulty, 
we introduce the following terminology: a class of sets is called a monotone 
class if it is closed under countable increasing unions and countable decreasing 
intersections. It is readily verified that the class of all subsets of a set is amonotone 
class and that the intersection of any nonempty family of monotone classes is a 
monotone class; hence there is a smallest monotone class containing any given 
class of sets. 

The proof of the lemma below introduces the notation + and | to denote 
increasing countable union and decreasing countable intersection, respectively. 


Lemma 5.43 (Monotone Class Lemma). The smallest monotone_class M 
containing an algebra A of sets is identical to the smallest o-algebra A contain- 
ing A. 


PROOF. We have M C A because A is a monotone class containing A. To 
prove the reverse inclusion, it is sufficient to show that M is closed under the 
operations of finite union and complementation, since a countable union can be 
written as the increasing countable union of finite unions. The proof is in three 
steps. 

First we prove that if A is in A and M is in M, then AU M and AN M are 
in M. For fixed A in A, let U/, be the class of all sets M in M such that AU M 
and AN M are in M. Then U, D A. If we show that U/, is a monotone class, 
then it will follow that 4 > M. For this purpose let 


U, +U and V, )V~ with U, and V, in U4. 
By definition of U/,4, the sets U, UA,U, 0 A, V, UA, and V, 9 A are in M. But 


U,UVAtUUA and U,ANATUNA, 
V,AUALVUA and V,NAJLVNA. 


Therefore U and V are in U/,, and U/, is a monotone class. 
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Second we prove that M is closed under finite unions. For fixed N in M, let 
Un be the class of all sets M in M such that N UM and NN M are in M. Then 
Uy > Aby the previous step. The same argument as in that step shows that Uy 
is a monotone class, and hence Uy = M. 

Third we prove that M is closed under complements. Let NV be the class of 
all sets in M whose complements are in M. Then N D A, and it is enough to 
show that Vis a monotone class. If 


Cn tC and D, | D_ with C, and D, inN, 
then C and D are in M since C,, and D,, are in M. Now 
Ci) C* and Dit D’, 


and by definition of VV, C¢ and D¢ are in M. Therefore C° and D® are in M, 
and C and D must be in NV. That is, Vis a monotone class. 


Lemma 5.44. If (X,.A, 2) and (Y, B, v) are o-finite measure spaces, then 
v(E,) and 4(E”) are measurable functions for every FE in A x B. 


PROOF IF 4(X) < +00 AND v(Y) < +00. Let M be the class of all sets E 
in A x B for which v(E,,) and w(E”) are measurable. We shall show that M is 
a monotone class containing the algebra C of finite disjoint unions of rectangles. 
If R = A x B is arectangle, then 


V(Rx) = v(Byl4 and = (R*) = (ADB, 
and so R is in M. If E and F are disjoint sets in M, then 
V((E U F)x) = v(Ex U Fy) = v(Ex) + (Fx) 


for each x, and similarly for jz for each y. By Proposition 5.7, v((E U F),) and 
(CE U F)*) are measurable. Hence E U F is in M,and M contains C. If {E,} 
and {F,,} are increasing and decreasing sequences of sets in M, then the finiteness 
and complete additivity of v imply that 


»((U 2x) ) = »(U Ens) = lim (Ens) 
and »((() Fn) _) = 2( 1) Fd) = lim vids), 


and similarly for 4. Since the limit of measurable functions is measurable 
(Corollary 5.10), we conclude that M is a monotone class. Therefore M contains 
A x B by the Monotone Class Lemma (Lemma 5.43). 
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PROOF FOR o-FINITE ££ AND v. Write X = UJ? , X» and Y = UP, Yn 
disjointly, with (Xj) < +00 and v(Y,,) < +00 for all m andn. Define A,, and 
B, by 


Am = {ANX, | Ais in A} and Bea {BOT B is in B}, 


and define jz, and v, on A,, and B, by restriction from yz and v. Then the triples 
(Xm, Am, em) and (Y,, Bn, vn) are finite measure spaces, and the previous case 
applies. If FE isin A x B, then Enn = EM (Xm X Yn) is in Am x B,, and so 
V((Emn)x) and ((Eimn)”) are measurable with respect to A,, and B,,, hence with 
respect to A and B. Thus 


V(Ex) = >) v(Emn)x) and (E”) =D) w((Emn)”) 


m,n 


exhibit v(E,,) and 4(E”) as countable sums of nonnegative measurable functions. 
They are therefore measurable. 


The next proposition simultaneously constructs the product measure and es- 
tablishes Fubini’s Theorem for indicator functions. 


Proposition 5.45. Let (X, A, ) and (Y, B, v) be o-finite measure spaces. 
Then there exists a unique measure jz x v on A x B such that 


(u x v)(A x B) = p(A)v(B) 


for every rectangle A x B. The measure ju x v is o-finite, and 
(ux v)(E) = i v(E,) d(x) = / y(E®) dv(y) 
x y 


for every set E in A x B. 


PROOF. In view of the measurability of v(E,.) given in Lemma 5.44, we can 
define a set function p on A x B by 


p(E)= f vedanta. 
x 
Then p(@) = 0, and p is nonnegative. Ona rectangle A x B, we have 


P(A x B) = p(A)v(B) (*) 


since v((A x B),y) = v(B)I4. We shall show that p is completely additive. If 
{E,,} is a disjoint sequence in A x B, then 
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o(Ue) = | »((U En) ) due) by definition of p 
n x n oy 
a) »(U Ens) du(x) by Lemma5.4la 
x n 


= y bs v((En)x) | d(x) _ since the sets (E,,), are disjoint 
xo a1 for each fixed x 


= ye / V((En)x) d(x) by Corollary 5.27 
n JX 
= > (En). 


Now X x Y = Un.» (Xm X Yn). Since p has just been shown to be completely 
additive and since wz and v are o-finite, (x) shows that p is o-finite. Also, («) 
completely determines p on the algebra C of finite disjoint unions of rectangles. 
By the Extension Theorem (Theorem 5.5), p is completely determined on the 
smallest o-algebra A x B containing C. 

Defining o (EF) = ty LL(E”) dv(y) and arguing in the same way, we see that o 
is a measure on A x B agreeing with p on rectangles and determined on A x B 
by its values on rectangles. Thus we have p = o on A x B, and can define 
jt X Vv = ep =o to complete the proof. 


Lemma 5.46. If f is a measurable function defined on a product space X x Y, 
then for each x in X, y +> f(x, y) is a measurable function on Y, and for each 
yinY,x b> f(x, y) is a measurable function on X. 


PROOF. For each fixed x, the formula 


{y| f@,y) >c} ={@, y)| FG. y) > ch, 


exhibits the set on the left as a section of a measurable set, which must be mea- 
surable according to Proposition 5.42. The result for fixed y is proved similarly. 


Theorem 5.47 (Fubini’s Theorem). Let (X, A, jw) and (Y, B, v) be o-finite 
measure spaces, and let (X x Y, A x B, uw x v) be the product measure space. 
If f is anonnegative measurable function on X x Y, then AE y f(x, y) dv(y) and 
Jy f &, y) d(x) are measurable, and 


[| fawxn=f Lf te »dv0)] duc 
XxY Xx Y 


= [Lf Fe. au] avo. 
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PROOF. Lemma 5.46 shows that f(x, y) is measurable in each variable sep- 
arately and hence that the inside integrals in the conclusion are well defined. If 
f is the indicator function of a measurable subset E of X x Y, then the theorem 
reduces to Proposition 5.45. The result immediately extends to the case of a 
simple function f > 0. 

Now let f be an arbitrary nonnegative measurable function. Find by Propo- 
sition 5.11 an increasing sequence of simple functions s, > 0 with pointwise 
limit f. The sequence of functions J y Sn(X, y) dv(y) is an increasing sequence 
of nonnegative functions, and each is measurable by what we have already shown 
for simple functions. By the Monotone Convergence Theorem (Theorem 5.25), 


tim f su(x.y)dv(y) =f tims,(x. 9) dv) = f FED AG). 
n Jy y ” Y 


Therefore J y f (&, y) dv(y) is the pointwise limit of measurable functions and is 
measurable. Similarly ve y f(x, y) d(x) is measurable. 
For every n, the result for simple functions gives 


/ Sn d( X y= [ [sate 99 avy] men. 
XxY xlly 


By a second application of monotone convergence, 


fd(uxv) = tim | Sn A(t XV) = tim | [ [ sce. avo] d(x). 
n JXxyY noJxtdJdy 


XxY 


By a third application of monotone convergence, 


tim [ [ser dvon] atx) = | [tim f sycx. »ydv09] ae: 
noesxX Y x Th py. 


Putting our results together, we obtain 


fd(ux y= if f(x, y)dv(y)] duce), 
XxY xX Y 


The other equality of the conclusion follows by interchanging the roles of X 
and Y. 


Fubini’s Theorem arises surprisingly often in practice. In some applications 
the theorem is applied at least in part to prove that an integral with a parameter 
is finite or is O for almost every value of the parameter. Here is a general result 
concerning integral 0. 
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Corollary 5.48. Suppose that (X, A, 2) and (Y, B, v) are o-finite measure 
spaces, and suppose that E is a measurable subset of X x Y such that 


v({y| (x,y) € E}) =0 


for almost every x [dj]. Then u({x | (x,y)«€ E}) = 0 for almost every y [dv]. 


REMARKS. In words, if the x section of E has v measure 0 for almost every 
x in X, then the y section of E has w measure O for almost every y in Y. For 
example, if one-point sets in X and Y have measure 0 and if every x section of 
E is a finite subset of Y, then for almost every y in Y, the y section of E has 
measure 0 in X. 


PRooF. Apply Fubini’s Theorem to Jz. The iterated integrals are equal, and 
the hypothesis makes one of them be 0. Then the other one must be 0, and the 
conclusion follows. 


When one tries to drop the hypothesis in Fubini’s Theorem that the integrand 
is nonnegative, some finiteness condition is needed, and the result in the form of 
Theorem 5.47 is often used to establish this finiteness. Specifically suppose that 
f is measurable with respect to A x B but is not necessarily nonnegative. The 
assumption will be that one of the iterated integrals 


[ [fire niavorjaney and ff ire. duey]avon 
x ¥ Y x 


is finite. Then the conclusions are that 


(a) f is integrable with respect to w x v; 

(b) f, y £ (&, y) dv(y) is defined for almost every x [dy]; if it is redefined to 
be 0 on the exceptional set, then it is measurable and is in fact integrable 
[dy]; 

(c) a similar conclusion is valid for f x f(x, y)du(x); 

(d) after the redefinitions in (b) and (c), the double integral equals each 
iterated integral, and the two iterated integrals are equal. 


These conclusions follow immediately by applying Fubini’s Theorem to f+ and 
jf” separately and subtracting. The redefinitions in (b) and (c) are what make the 
subtractions of integrands everywhere defined. 

One final remark is in order: The completion of A x B is not necessarily the 
same as the product of the completions of A and B, and thus the statement of 
Fubini’s Theorem requires some modification if completions of measure spaces 
are to be used. We shall see in the next chapter that Borel sets in Euclidean space 
behave well under the formation of product spaces, but Lebesgue measurable sets 
do not. Thus it simplifies matters to stick to integration of Borel-measurable sets 
in Euclidean space whenever possible. 
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8. Integration of Complex- Valued and Vector- Valued Functions 


Fix a measure space (X,.A, ). In this chapter we have worked so far with 
measurable functions on X whose values are in R*, dividing them into two classes 
as far as integration is concerned. One class consists of measurable functions with 
values in [0, -+-oo], and we defined the integral of any such function as a member 
of [0, +00]. The other class consists of general measurable functions with values 
in R*. The integral in this case can end up being anything in R*, and there are 
some such functions for which the integral is not defined. 

It is important in the theory to be able to integrate functions whose values 
are complex numbers or vectors in R” or C”, and it will not be productive to 
allow the same broad treatment of infinities as was done for general functions 
with values in R*. On the other hand, it is desirable to have the flexibility with 
nonnegative measurable functions of being able to treat infinite values and infinite 
integrals in the same way as finite values and finite integrals. In order to have 
two theories, rather than three, once we pass to vector-valued functions, we shall 
restrict somewhat the theory we have already developed for general functions 
with values in R*. 

Let us label these two theories of integration as the one for scalar-valued non- 
negative measurable functions and the one for integrable vector-valued functions. 
The first of these theories has already been established and needs no change. The 
second of these theories needs some definitions and comments that in part repeat 
steps taken with Riemann integration in Sections 1.5, III.3, and III.7 and in part 
are new. In applications of this second theory later, if the term “vector-valued” 
is not included in a reference to a function either explicitly or by implication, the 
convention is that the function is scalar-valued. 

In the theory for vector-valued functions, we shall be assuming integrability, 
and the integrability will force the function to have meaningful finite values almost 
everywhere. Our convention will be that the values are finite everywhere. This 
will not be a serious restriction for any function that can be considered integrable, 
since we can redefine such a function on a certain set of measure 0 to be 0, and 
then the condition will be met without any changes in the values of integrals. 

Thus let a function f : X — C” be given. Since the function can have 
its image contained in R”,, we will be handling IR”-valued functions at the same 
time. Since m can be 1, we will be handling complex-valued functions at the same 
time. Since the image can be in R” and m can be 1, we will at the same time 
be recasting our theory of real-valued functions whose values are not necessarily 
nonnegative. We impose the usual Hermitian inner product (-, -) and norm | - | 
on C”. 

The function f : X —> C” is the composition of f followed by complex 
conjugation in each entry of C”. We can write f = Ref +ilIm/, where 
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Re f = s(f + f)andIm f = x(f — f), and then the functions Re f and Im f 
take values in R”. Following the convention in Section A7 of the appendix, let 
{u,,..., Um} be the standard basis of R”. 

By a basic open set in C”, we mean a set that is a product in R”” of bounded 
open intervals in each coordinate. In symbols, such a set is centered at some 
vo € C”, and there are positive numbers &; and n; such that the set is 


{v eC” | |(Re(v— vo), u;)| < & and |(Im(v— v9), u;)| < nj for 1 < j < m}. 


We say that f : X — C” is measurable if the inverse image under f of each 
basic open set in C” is measurable, i.e., lies in A. 


Lemma 5.49. A function f : X — C” is measurable if and only if the inverse 
image under f of each open set in C” is in A. 


PROOF. If the stated condition holds, then the inverse image of any basic open 
set is in A, and hence f is measurable. Conversely suppose f is measurable, 
and let an open set U in C” be given. Then U is the union of a sequence of 
basic open sets U,, and the measurability of f, in combination with the formula 
f'U) =U, f-'Wn), shows that f~!(U) is in A. 


Proposition 5.50. A function f : X — C” is measurable if and only if Re f 
and Im f are measurable. 


PROOF. In view of Lemma 5.49, we can work with arbitrary open sets in place 
of basic open sets. If U and V are open sets in R’”, then the product set U +iV is 
open in C”, and f-'(U + iV) = (Re f)“!(U) N Um f)7!(V). It is immediate 
that measurability of Re f and Im f implies measurability of f. Conversely if we 
specialize this formula to V = R”, then we see that measurability of f implies 
measurability of Re f. Similarly if we specialize to U = R”, then we see that 
measurability of f implies measurability of Im f. 


Proposition 5.51. The following conditions on a function f : X > C” are 
equivalent: 


(a) f is measurable, 
(b) (f, v) is measurable for each v in C”, 
(c) (f, uj) is measurable for 1 < j < m. 


REMARKS. When infinite-dimensional ranges are used in more advanced 
texts, (a) is summarized by saying that f is “strongly measurable,’ and (b) is 
summarized by saying that f is “weakly measurable.” 
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PROOF. Suppose (a) holds. The function in (b) is the composition of f followed 
by the continuous function (-, v) from C” to C. The inverse image of an open 
set in C is then open in C”, and the inverse image of the latter open set under 
f is in A. This proves (b). Condition (b) trivially implies condition (c). If (c) 
holds, then Proposition 5.50 shows that (Re f, u;) and (Im f, u;) are measurable 
from X into R. Thus the inverse image of any open interval under any of these 
2m functions on X is in A. The inverse image of a basic open set in C” under f 
is the intersection of 2m such sets in A and is therefore in A. Hence (a) holds. 


Proposition 5.52. Measurability of vector-valued functions has the following 
properties: 


(a) If f : X > C” and g: X — C” are measurable, then so is f + g asa 
function from X to C”. 

(b) If f : X — C” is measurable and c is in C, then cf is measurable as a 
function from X to C”. 

(c) If f : X — C” is measurable, then so is f : X > C”. 

(d) If f : X¥ — Cand g : X — Care measurable, then so is fg : X > C. 

(e) If f : X — C” is measurable, then | f| : X — [0, +00) is measurable. 

(f) If {f,} is a sequence of measurable functions from X into C” converging 
pointwise to a function f : X — C”, then f is measurable. 


PROOF. Conclusions (a) through (e) may all be proved in the same way. It 
will be enough to illustrate the technique with (a). We can write the function 
xt> f(x) + g(x) as acomposition of x Fh (f(x), g(x)) followed by addition 
(a, b) +> a+b. Let an open set in C” be given. The inverse image under addition 
is open in C” x C”, since addition is continuous (Proposition 2.28). The inverse 
image of a product U x V of open sets in C” x C” under x + (f(x), g(x)) is 
f~'(U) 1 g7!(V), which is in A because f and g are measurable, and therefore 
the inverse image of any open set in C” x C” under x + (f(x), g(x)) is in A. 
This handles (a), and (b) through (e) are similar. 

For (f), we apply Proposition 5.50 to f, and then we apply the equivalence 
of (a) and (c) of Proposition 5.51 for Re f and Im f. In this way the result is 
reduced to the real-valued scalar case, which is known from Corollary 5.10. 


If E is a measurable subset of X, we say that a function f : X — C is 
integrable on E if Re f and Im / are integrable on £, and in this case we define 


J, fdu=f,Refdutif,Imf du. 


Proposition 5.53. Let E be a measurable subset of X. Integrability on E of 
functions from X to C has the following properties: 
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(a) If f and g are functions from X into C that are integrable on E,, then f +g 
is integrable on E, and fy (f + g)du= fy fdut fy gdu. 

(b) If f is a function from X into C that is integrable on E and if c is in C, 
then cf is integrable on E, and [,,cf du=c J, f du. 

(c) If f is a measurable function from X into C such that | f| is integrable on 
E, then f is integrable on E, and | [, f(x) du(x)| < fi, |f@)|duQ). 

(d) (Dominated convergence) Let f,, be a sequence of measurable functions 
from X into C integrable on E and converging pointwise to f. If there is a 
measurable function g : X — [0, -++oo] that is integrable on F and has | f, (x)| < 
g(x) for all x in E, then f is integrable on E, lim, fs p fn ay exists in C, and 
limn fp frndu = fy f du. 


PROOF. Conclusion (a) is immediate from the definitions, and so is (b) for real 
scalars. Taking (a) and (b) into account, we see that (b) holds if it holds forc =i. 
We have if = —Im f +iRe/f. If f is integrable, then —Im f and Re f are 
integrable, and hence if is integrable. Then 


if, fdu=ilf,Refduti f, Im f du) 
= fp(-lm fydu + fy GRe f)du = frif du, 


and hence (b) is proved. 

In (c), if f : X — Cis integrable, choose c with |c| = 1 such that ef fdu 
is real and > 0. Application of (b) and Proposition 5.16 gives | tee du = 
ef, fdu= f,efdue= f,Recfdp = J, leflan = {of lau: 

Finally (d) follows by applying the Dominated Convergence Theorem (Theo- 
rem 5.30) to Re f,, and Im f,, separately and then combining the results. 


We turn now to the matter of integrability of vector-valued functions, together 
with the value of the integral. One way of proceeding is to go back and adapt 
the theory in Sections 3-4 to work directly with vector-valued functions and 
approximations by vector-valued simple functions. This approach is useful if 
at some stage one wants systematically to allow infinite-dimensional vectors as 
values. Examples of this situation will arise in this book, but there are not enough 
examples to justify an abstract treatment. One important example arises in the next 
section with functions of the form f(x, y), which can be regarded as functions 
of x that take values in a space of functions of y. 

Thus we use an abstract definition of integrability that is appropriate only to 
the case of finite-dimensional range. If E is a measurable subset of X, we say 
that a function f : X — C” is integrable on E if the complex-valued functions 
(f, uj) are integrable on F for each u; in the standard basis, and in this case we 


define fi, f du = "1 (Sp (uj) dp)uj. 
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Proposition 5.54. Let E be a measurable subset of X. Integrability of vector- 
valued functions on E satisfies the following properties: 


(a) If f and g are functions from X into C” that are integrable on EF, then 
f + is integrable on E, and f, (f + g)du= fy fdut fy gdu. 

(b) If f is a function from X into C” that is integrable on E, then cf is 
integrable on E,and [,,cfdu=cf, f du. 

(c) A function f : X — C” is integrable on E if and only if Re f and Im f 
are integrable on E,and then f, fdu= f,Re fdu+i fy Im f du. 

(d) If f is a function from X into C” that is integrable on E and if v is a 
member of C”, then x +> (f(x), v) is integrable on E and kE OO: v)du(x) = 
(fi, f (&) du(x), v). 

(e) If f is a measurable function from X into C” such that | f| is integrable on 
E, then f is integrable on E, and | ,, f(x) du(x)| < fi, |f@)|du(). 

(f) (Dominated convergence) Let f, be a sequence of measurable functions 
from X into C” integrable on E and converging pointwise to f. If there is a 
measurable function g : X — [0, +00] that is integrable on F and has | f, (x)| < 
g(x) for all x in E, then f is integrable on E, lim, /[, rp Jn dp exists in C”, and 
lity fs fedw= pf di: 


PRooF. All of the relevant questions about measurability are addressed by 
Propositions 5.50 and 5.52. Conclusions (a), (b), (c), and (f) about integrability 
are immediate from Proposition 5.53. 

For (d), let v = > cju; with each c; in C. Since f is by assumption integrable, 
(fiv) = (f, Vicjuj) = 00; 6)(f. uj) exhibits (f, v) as a linear combination 
of functions integrable on E. Therefore (f, v) is integrable on E. To obtain 
the formula asserted in (d), we first consider v = u;. Then the definition of 
te fdu gives (f, f du, ui) = (a (teh. uj)du)uj, ui) = f_ (fui) du. 
Multiplying by ¢; and adding, we obtain (f,, fdu,v) = f,(f,v)du. This 
proves (d). 

For (e), let f : X — C” be integrable on E. The asserted inequality is trivial 
if [,, f du = 0. Otherwise, for every v in C”, 


(fe fav, v)|=|feG.v)du| by @) 
< fel fh vidu by Proposition 5.53c 


< lvl fe lfladu by Proposition 5.16 and 
the Schwarz inequality. 


Taking v = f,, fdu gives | f, fdul? < | f, fdul fp \fldu. Since f, fdu 
has been assumed nonzero, we can divide by its magnitude, and then (e) follows. 
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9. L', L?, L~, and Normed Linear Spaces 


Let (X, A, 4) be a measure space. In this section we introduce the spaces L HX); 
L?(X),and L©(X). Roughly speaking, these will be vector spaces of functions on 
X with suitable integrability properties. More precisely the actual vector spaces 
of functions will form pseudometric spaces, and the spaces L'(X), L?(X), and 
L°°(X) will be the corresponding metric spaces obtained from the construction 
of Proposition 2.12. They will all turn out to be vector spaces over R or C. It 
will matter little whether the scalars for these vector spaces are real or complex. 
When we need to refer to operations with scalars, we may use the symbol F to 
denote R or C, and we call F the field of scalars. We shall make explicit mention 
of R or C in any situation in which it is necessary to insist on a particular one of 
RorC. 

The three spaces we will construct will all be obtained by introducing “pseudo- 
norms” in vector spaces of measurable functions. A pseudonorm on a vector 


space V is a function || - || from V to [0, +00) such that’ 
(i) ||x|| > Oforallx eV, 
(ii) ||cx|] = |c|||x|| for all scalars c and all x € V, 


(iii) (triangle inequality) ||x + y|| < ||x|| + ||y|] for all x and yin V. 


We encountered pseudonorms earlier in connection with pseudo inner-product 
spaces; in Proposition 2.3 we saw how to form a pseudonorm from a pseudo 
inner product. However, only the pseudonorm for L?(X) arises from a pseudo 
inner product in the construction of L', L?, and L®. 

The definitions of the pseudonorms in these three instances are 


Wf ll, = Sy lfldu for L'(X), 
If lle = (Sy lfP an)” for L2(X), 


If lloo = “essential supremum” of f for L°(X). 


Once we have defined “essential supremum,” all the above expressions are mean- 
ingful for any measurable function f from X to the scalars, and the vector space V 
in each of the cases is the space of all measurable functions from X to the scalars 
such that the indicated pseudonorm is finite. In other words, V consists of the 
integrable functions on X in the case of L'(X), the square-integrable functions 
on X in the case of L7(X), and the “essentially bounded” functions on X in the 
case of L°(X). 

We need to check that || + |], [I - ll,,and || + Il, 
that the spaces V are vector spaces in each case. 


are indeed pseudonorms and 


’The word “seminorm” is a second name for a function with these properties and is generally 
used in the context of a family of such functions. We shall not use the word “seminorm” in this text. 
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For L!(X), properties (i) and (ii) are immediate from the definition. For (iii), 
we have | f(x) + g(@)| < |f()| + |g(x)| for all x and therefore || f + g||, = 
Ilf +eldu <fylfldu+ fyleldu =f ll, + lle. 

For L?(X), let V be the space of all square-integrable functions on X. The 
space V is certainly closed under scalar multiplication; let us see that it is closed 
under addition. If f and g are in V, then we have 


(FO + Ig@)))? < (max{If@)), [g@))} + max{|f OI, Ign)” 
= 4max{| f (x)|?, |g@)I7} < 41f @)? + 41g@x)/? 


for every x in X. Integrating over X, we see that f + g isin V if f and g are 
in V. Also, the left side is > 4| f (x)| |g(x)|, and it follows that fg is integrable 
whenever f and g are in V. Then the definition (f, g), = [ p fad makes V 
into a pseudo inner product-space in the sense of Section II.1. Hence Proposition 
2.3 shows that the function || - ||, with || f|l, = Cf, fais is a pseudonorm on V. 

For L®(X), we say that f is essentially bounded if there is a real number M@ 
such that | f (x)| < M almost everywhere [dj]. Let us call such an M an essential 
bound for | f|. When / is essentially bounded, we define || f ||, to be the infimum 
of all essential bounds for | f|. This infimum is itself an essential bound, since the 
countable union of sets of measure 0 is of measure 0. The infimum of the essential 
bounds is called the essential supremum of | f|. Certainly || - ||,, satisfies (i) and 
(ii). If | f| is bounded a.e. by M and if |g| is bounded a.e. by N, then | f + g| is 
bounded everywhere by | f|+|g|, which is bounded a.e. by M+N. It follows that 
f + g is essentially bounded and || f + glloo < III f1 + Igllloc < If lloo + llllco: 
So (iii) holds for || - ||. 

A real or complex vector space with a pseudonorm is a pseudo normed linear 
space. Such a space V becomes a pseudometric space by the definition d(f, g) = 
| f — g||, according to the proof of Proposition 2.3. Proposition 2.12 shows that 
if we define two members f and g of V to be equivalent whenever d(f, g) = 0, 
then the result is an equivalence relation and the function d descends to a well- 
defined metric on the set of equivalence classes. If we take into account the 
vector space structure on V, then we can see that the operations of addition and 
scalar multiplication descend to the set of equivalence classes, and the set of 
equivalence classes is then also a vector space. The argument for addition is that 
if d(fi, fo) =0 and d(g1, g2) = 0, then d(fi + g1, f2 + g2) is 0 because 


d(fit ai, fo+ a2) =i + ap) — (f2 + 82) ll = IIA — fa) + (a1 — 22) l 
< |lfi — fall + Ilgi — g2ll =d(fi, fo) + d(g1, g2) = 0. 


The argument for scalar multiplication is similar, and one readily checks that the 
space of equivalence classes is a vector space. 
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This construction is to be applied to the spaces V we formed in connection 
with integrability, square integrability, and essential boundedness. The spaces of 
equivalence classes in the respective cases are called L!(X), L?(X), and L®(X). 
These spaces of equivalence classes are pseudo normed linear spaces with the 
additional property that || f|| = 0 only for the 0 element of the vector space. 
If there is any possibility of confusion, we may write L'(w) or L'(X, 2) or 
L'(X, A, w) in place of L'(X), and similarly for L? and L®. 

A pseudo normed linear space is called a normed linear space if || / || = 0 
implies f is the 0 element of the vector space. Thus L!(X), L*(X), and L®(X) 
are normed linear spaces. 

In practice, in order to avoid clumsiness, one sometimes relaxes the terminol- 
ogy and works with the members of L'(X), L?(X), and L®(X) as if they were 
functions, saying, “Let the function f be in L'(X)” or “Let f be an L! function.” 
There is little possibility of ambiguity in using such expressions. 

The 1-dimensional vector space consisting of the field of scalars F with absolute 
value as norm is an example of a normed linear space. Apart from this and F’”, 
we have encountered one other important normed linear space thus far in the 
book. This is the space B(S) of bounded functions on a nonempty set S. It 
has various vector subspaces of interest, such as the space C(S) of bounded 
continuous functions in the case that S is a metric space. The norm for B(S) is 
the supremum norm or the uniform norm defined by 


IF llaup = Sup Lf (S)1- 
ses 
The corresponding metric is 


dF. 8) = If ~ 8llaup = Sup Lf(5) — 86) 


and this agrees with the definition of the metric in the example in Chapter II. 
Proposition 2.44 shows that the metric space B(S) is complete. Any vector 
subspace of B(S) is a normed linear space under the restriction of the supremum 
norm to the subspace. 

In working with specific normed linear spaces, we shall often be interested in 
seeing whether a particular subset of the space is dense. In checking denseness, 
the following proposition about an arbitrary normed linear space is sometimes 
helpful. The intersection of vector subspaces of X is a vector subspace, and the 
intersection of closed sets is closed. Therefore it makes sense to speak of the 
smallest closed vector subspace containing a given subset S of X. 


Proposition 5.55. If X is a normed linear space with norm || - || and with F 
as field of scalars, then 
(a) addition is a continuous function from X x X to X, 
(b) scalar multiplication is a continuous function from F x X to X, 
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(c) the closure of any vector subspace of X is a vector subspace, 
(d) the set of all finite linear combinations of members of a subset S of X is 
dense in the smallest closed vector subspace containing S. 


PROOF. The formula ||(x + y) — (xo + yo)ll < lls — xoll + lly — yoll shows 
continuity of addition because it says that if x is within distance €/2 of xo and y is 
within distance €/2 of yo, then x + y is within distance € of x9 + yo. Similarly the 
formula ||cx — coxol| < |lc(x — xo) || + l(c — co) xoll = lelllx —xoll + le — col llxoll 
shows that ||cx — coxo|| < d(|co| + 1) +4||xq|] as soon as 6 < 1, |c —co| < 6, and 
|x —xoll < 6. Ife withO < € < 1 is given and if we set 5 = (|co| +1+||xoll)~'e, 
then we see that |c — co| < 6 and ||x — xo|| < 6 together imply ||cx — coxoll < €. 
Hence scalar multiplication is continuous. This proves (a) and (b). 

From (a) and (b) it follows that if x, — x and y, > yin X andc, > cinF, 
then x, + yn > x + y and cyx, — cx. This proves (c). 

For (d), the smallest closed vector subspace V; containing S certainly contains 
the closure V2 of the set of all finite linear combinations of members of S. Part (c) 
shows that V2 is a closed vector subspace, and hence the definition of V; implies 
that V; is contained in V2. Therefore V; = V2, and (d) is proved. 


Proposition 5.56. Let (X, A, jz) be a measure space, and let p = 1 or p = 2. 
Then every indicator function of a set of finite measure is in L?(X), and the 
smallest closed subspace of L’?(X) containing all such indicator functions is 
L?(X) itself. 


REMARK. Proposition 5.55d allows us to conclude from this that the the set of 
simple functions built from sets of finite measure lies in both L'!(X) and L?(X) 
and is dense in each. It of course lies in L(X) as well, but it is dense in L©(X) 
if and only if w(X) is finite. 


ProoF. If E is a set of finite measure, then the equality ie Ue)? du = wW(E) 
shows that J, is in L? for p = 1 and p =2. 

In the reverse direction let V be the smallest closed vector subspace of L? 
containing all indicator functions of sets of finite measure. Suppose that s = 
>=, cxle, is the canonical expansion of a simple function s > 0 in L? and that 
cx > 0. The inequalities 0 < cy,lg, < s imply that c,/g, isin L?. Hence J, is in 
L?,and w(E;) is finite. Thus every nonnegative simple function in L? lies in V. 

Let f > 0 be in L?”, and let s, be an increasing sequence of simple functions 
> 0 with pointwise limit f. Since 0 < s, < f,eachs, isin L?. Since | f — s,|? 
has pointwise limit 0 and is dominated pointwise for every n by the integrable 
function | f|?, dominated convergence gives lim { y lf — 8n|? du = 0. Hence 
Sy tends to f in L?. Combining this conclusion with the result of the previous 
paragraph, we see that every nonnegative L? function is in V. Any L? function 
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is a finite linear combination of nonnegative L’? functions, and hence every L? 
function lies in V. 


Let us digress briefly once more from our study of L', L?, and L® to obtain 
two more results about general normed linear spaces. A linear function between 
two normed linear spaces is often called a linear operator. A linear function 
whose range space is the field of scalars is called a linear functional. The 
following equivalence of properties is fundamental and is often used without 
specific reference. 


Proposition 5.57. Let X and Y be normed linear spaces that are both real or 
both complex, and let their respective norms be || - ||, and || - ||y. Then the 
following conditions on a linear operator L : X — Y are equivalent: 

(a) L is uniformly continuous on X, 

(b) L is continuous on X, 

(c) L is continuous at 0, 

(d) L is bounded in the sense that there exists a constant M such that 


IL@)lly < Mllxlly 


for all x in X. 


ProoF. If L is uniformly continuous on X, then L is certainly continuous on 
X. If L is continuous on X, then L is certainly continuous at 0. Thus (a) implies 
(b), and (b) implies (c). 

If L is continuous at 0, find 6 > O for € = 1 such that ||x — Ol, < 6 
implies ||L(@x) — L(O)||, < 1. Here L(O) = O. Ifa general x ¥ 0 is given, 
then ||x||, 4 0, and the properties of the norm give IS /Wxlly lly = 6. Thus 
|L((5/||xlly)x)lly < 1. By the linearity of L and the properties of the norm, 
(5/IxllyIL@lly < 1. Therefore ||L(x)|ly < 5~"\|x || ,. and L is bounded with 
M = 5~'. Thus (c) implies (d). 

If L is bounded with constant M and if € > 0 is given, let 5 = €/M. Then 
Ix1 — X2||y < 5 implies 


Lr) — Lally = LG — x2) Ily S$ Mlx1 — x2lly < 6M =e. 
Thus (d) implies (a). 


If L : X — Y is a bounded linear operator, then the infimum of all constants 
M such that ||L(x) ||, < M||x||y for all x in X is again such a constant, and it is 
called the operator norm ||Z|| of L. Thus it in particular satisfies 


IL@lly SIL MMlelly for all x in X. 
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As a consequence of the way that L and the norms in X and Y interact with scalar 
multiplication, the operator norm is given by the formulas 


|L|| = sup ||L@)lly = sup ||L@)lly 


Ixlly<1 Ix I=1 


except in the uninteresting case X = 0. It is easy to check that the bounded linear 
operators from X into Y form a vector space, and the operator norm makes this 
vector space into a normed linear space that we denote by B(X, Y). When the 
domain and range are the same space X, we refer to the members of B(X, X) 
as bounded linear operators on X. The normed linear space B(X, X) has a 
multiplication operation given by composition. 

When Y is the field of scalars F, the space B(X, F) reduces to the space of con- 
tinuous linear functionals on X. This is called the dual space of X and is denoted 
by X*. For example, if X = L'(y), then every member g of L®(ww) defines a 
member x; of X* by x7(f) = J fgdw for f in L'(w); the linear functional xe 
has ||x¢|| < llglloo. We shall be interested in two kinds of convergence in X™. 
One is norm convergence, in which a sequence {x7} converges to an element x* 
in X* if ||x7 — x*|| tends to 0. The other is weak-star convergence, in which 
{x7} converges to x* weak-star against X if lim, x*(x) = x*(x) for each x in X. 


Theorem 5.58 (Alaoglu’s Theorem, preliminary form). If X is a separable 
normed linear space, then any sequence in X* that is bounded in norm has a 
subsequence that converges weak-star against X. 


REMARKS. In Chapter VI we shall see that L! and L” are separable in the case 
of Lebesgue measure on R! and in the case of many generalizations of Lebesgue 
measure to N-dimensional Euclidean space. 


PROOF. Let a sequence {x7}°° | be given with ||x;7|| < M, and let {x,} be a 
countable dense set in X. For each k, we have |x7(xx)| < [x7 llllxxll < Ml[xell. 
and hence the sequence {x7 (x,)}°°., of scalars is bounded for each fixed k. By the 
Bolzano—Weierstrass Theorem, {x7 (x,)}7° , has a convergent subsequence. Since 
we can pass to a convergent subsequence of any subsequence for any particular k, 
we can use a diagonal process to pass to a single convergent subsequence {x7, }7- 
such that lim, x;, (x) exists for all k. 

Now let xo be arbitrary in X, let € > 0 be given, and choose x, in the dense 
set with ||x, — xo|| < €. Then 


Xn, (xo) — Xny (xo)| < Le (xo — Xx)| + Xn, (xx) — Xin (xx) | + Leap (xz — Xo)| 
< M||xo — xell + xp, Ke) — Xp, | + Mixx — xoll 
<2Me+ Ean (x,) — Xing (xx)|. 
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Thus we sup |x;,, (Xo) — Xn (xo)| < 2Me. Since € is arbitrary, we conclude that 
,l/>00 

{Xp, (xo) } Poy is a Cauchy sequence of scalars. It is therefore convergent. Denote 
the limit by x*(xo), so that lim; Xn, (xo) = x*(xo) for all xo in X. Since limits 
respect addition and multiplication of scalars, x* is a linear functional on X. The 
computation |x*(xo)| = | lim; x; (%o)| = Tiny |x; (%o)| < lim sup; ||" |lxoll < 
M||xo|| shows that x* is bounded. Hence {x7} 721 converges to x* weak-star 
against X. 


Now, as promised, we return to L', L?, and L®. The completeness asserted 
in the next theorem will turn out to be one of the key advantages of Lebesgue 
integration over Riemann integration. 


Theorem 5.59. Let (X,.A, ) be any measure space, and let p be 1, 2, 
or oo. Any Cauchy sequence {f;} in L? has a subsequence {f;,} such that 
I fin — Fin Il, < Cringm.n} With >>, Cn < +oo. A subsequence { f;,} with this 
property is necessarily Cauchy pointwise almost everywhere. If f denotes the 
almost-everywhere limit of { f,,,}, then the original sequence { f;,} converges to f 
in L?. Consequently these three spaces L’, when regarded as metric spaces, are 
complete in the sense that every Cauchy sequence converges. 


REMARKS. The broad sweep of the theorem is that the spaces L!, L?, and 
L° are complete. But the detail is important, too. First of all, the detail 
allows us to conclude that a sequence convergent in one of these spaces has 
an almost-everywhere convergent subsequence. Second of all, the detail allows 
us to conclude that if a sequence of functions is convergent in L”! and in L??, 
then the limit functions in the two spaces are equal almost everywhere. 


PROOF. Let { f,} be aCauchy sequence in L”. Inductively choose integers nz by 
defining no = | and taking n; to be any integer > nz_, such that || fin — fr, Ilp < 
2-* for m > nx; we can do so since the given sequence is Cauchy. Then the 
subsequence { f;,} has the property that || fn — fall, < 27" for all m > 1 
and n > 1. This proves the first conclusion of the theorem. 

Now suppose that we have a sequence {f,,} in L? such that || f, — fm = 
Cmintm,ny With >>, Cn = C < +00. We shall prove that { f,} is Cauchy pointwise 
almost everywhere and that if f is its almost-everywhere limit, then /,, tends to 
fin L?. 

First suppose that p < oo. Let g, be the function from X to [0, +-oo] given by 


gn =lfilt+ >> Ife — fel. (*) 


k=2 
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and define g(x) = lim g,, (x) pointwise. Then 
1 n 
(Sy an du)” = Igall, <llfillp + >. W fe — fe-allp 
k=2 


< IIfilly + >> Ce-1 < Ifill, +C- 
k=2 


By monotone convergence, we deduce that ( f yed y)/? = ||g\l, is finite. Thus 
g is finite a.e., and consequently the series 


> f(x) — fri (x)| converges in R for a.e. x [dy]. (+) 
k=2 


By redefining the functions f;, on a set of 4. measure 0, we may assume that the 
series («*) converges pointwise to a limit in R for every x. Consequently the 


series 
lee) 


(f(x) — fr-1%)) 
k=2 
is absolutely convergent for all x and must be convergent for all x. The partial 
sums for the series without the absolute value signs are f, (x) — f(x), and hence 
f (x) = lim f,, (x) exists in R for every x. For every n, 


CO 


If-fals D> lfe-fi-il Se, (t) 


k=n+1 


and we have seen that g” is integrable. By dominated convergence, we conclude 
that lim, fy |f — fal? du = fy limy | f (x) — fn) |? d(x) = 0. In other words, 
lim, || f — Sally = 0. Therefore f, tends to f in L? (yw). 

Next suppose that p = oo. Let {f,} be any Cauchy sequence in L®. For each 
mand n, let E,,, be the subset of X where | fin — fal > Il fin — Srila, and put 
E = Un.n Emn. This set has measure 0. Redefine all functions to be 0 on E. 
The redefined functions are then uniformly Cauchy, hence uniformly convergent 
to some function f, and then f,, tends to f in L°(X). 

For any p, we have shown that the original Cauchy sequence {f,} has a 
convergent subsequence { f,,} in L?. Let f be the L? limit of the subsequence. 
Given € > 0, choose N such thatn > m > N implies || fn — Small, < €,and then 
choose K such that || f,, — fil, <eéfork > K.Fixk > K withn, > N. Taking 
m = nx, we see that || fn — fll, < fn — fully + lf — fll, < 2€ whenever 
n > nx. Thus { f,} converges to f. This completes the proof of the theorem. 


9. L!, L?, L®, and Normed Linear Spaces 287 


In Section 9 we introduced integration of functions with values in R” or C’”. 
The definitions of L!, L?, and L© may be extended to include such functions, 
and we write L'(X, C’”), for example, to indicate that the functions in question 
take values in C”. In the definitions any expression | f(x)| or | f| that arises in 
the definition and refers to absolute value in the scalar-valued case is now to be 
understood as referring to the norm on the vector space where the functions take 
their values. The vector-valued L!, L?, and L®© spaces are further normed linear 
spaces, and one readily checks that Theorem 5.59 with the above proof applies 
to them because the range spaces are complete. 

The triangle inequality for a pseudo normed linear space says that the norm 
of the sum of two elements is less than or equal to the sum of the norms, and of 
course the inequality instantly extends to a sum of any finite number of elements. 
But what about an integral of elements? In the case that the linear space is one 
of the precursor spaces “V” for L', L?, or L™, the setting is that of functions 
of two variables. One of the variables corresponds to the measure space under 
study, and the other corresponds to the indexing set for the integral of the norms. 
Thus we could, if we wanted, force the situation into the mold of vector-valued 
functions whose values are in a space of functions. But it is not necessary to do 
so, and we do not. Here is the theorem. 


Theorem 5.60 (Minkowski’s inequality for integrals). Let (X,.A, w) and 
(Y, B, v) be o-finite measure spaces, and put p = 1,2, or oo. If f is measurable 
on X x Y,then 


| f to ranco] fe 2tpany dee 


in the following sense: The integrand on the right side is measurable. If the 
integral on the right is finite, then for almost every y [dv] the integral on the left 
is defined; when it is redefined to be 0 for the exceptional y’s, then the formula 
holds. 


REMARK. An extension of this theorem to values of p other than 1, 2, co will 
be given in Chapter IX, and that result will have the same name. 


PROOF. The right side of the integral formula is unchanged if we replace f by 
| f|, and thus we may assume that f > O without loss of generality. If p = 1, 
then the formula for f > 0 reads 


le [ Se Fe. dud] dv0) < fy [fy Fe, dv] auc. 


In fact, equality holds, and the result just amounts to Fubini’s Theorem (Theorem 
5.47). 
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Let p = 2. We have 
IF, DIlb ave) =f If, y)P dv(y), 
Y 


and this is measurable by Fubini’s Theorem. Hence || f (x, Wlloav () is measur- 
able. The idea for proving the inequality in the statement of the theorem is to 
imitate the argument that derives the triangle inequality for L” from the Schwarz 
inequality. That earlier argument is 


lg + Alls = Ilgllt + 2Re(g,h) + INAIS < llellS + 2lgllailalls + Alls. 


The adapted argument is 


Ie fO dUO) 5 pry= Se Seed 9) AUC) rcxf OY) u(x’) dv(y) 
= fyxx Li £@, NFO ydv(y)] due) dur’) 
< fey NFO Whar fO Mle,avgy 44) du’) 
=e IF Cini do@| 3 


the second and third lines following from Fubini’s Theorem and the Schwarz 
inequality. 
Let p = oo. This is the hard case of the proof. We proceed in three steps. The 


first step is to prove the asserted measurability of || f(x, y) Il, anys and we do so 


by first handling simple functions and then passing to the limit. If s = ey Cale, 
is the canonical expansion of a simple function s > 0 on X x Y and if x is fixed, 
then ||s(x, Wl sa aves = max {cn | V((En)x) > 0}. In other words, if k,, is the indi- 
cator function of the set {x EX | V((En)x) > 0}, then s = max{c)k),...,cyky}. 
Each function c,k, is measurable by Lemma 5.44, and the pointwise maximum s is 
measurable by Corollary 5.9. Returning to our function f > 0,weuse Proposition 
5.11 to choose an increasing sequence {s,} of nonnegative simple functions with 
pointwise limit f. We prove that ||s, (x, W)loo,dviy) increases to || f(x, Wilke ape 
for each x , and then the measurability follows from Corollary 5.10. Since x is fixed 
in this step, let us drop it and consider an increasing sequence {s,,} of nonnegative 
measurable functions on Y with limit f on Y; we are to show that || f||,, = 
lim ||5y||,,- The numbers ||s,||,, are monotone increasing and are < || f||,,. Thus 
lim |ISulloo < If ll, Arguing by contradiction, suppose that equality fails and 
that lim ||s,||,, < M < M+e < ||f|l,,. Then {y | Sn(y) = M+e} has measure 0 
for every n, and so does LU, { y | SQ) => M+ e}, by complete additivity. On 
the other hand, { y | fo) >M+ €} is a subset of this union, and it has positive 
measure since M + € < || f||,,. Thus we have a contradiction and conclude that 
lim |S lloo = Il fll.o- Consequently || f(x, Wloo,aviy) is measurable, as asserted. 
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The second step is to prove that any measurable function F > 0 on Y has 
IF ll. = Sup, | & Fg dv|, where the supremum is taken over all g > 0 with 
ligll, < 1. Certainly any such g has | fy Fgdv| < |IFlln Jy gdv = WF loo: 
and therefore sup, | J, y Ped v| < ||F'||,,- For the reverse inequality, let Jz be the 


indicator function of a set of finite positive measure, and put g = v(E)~'Ig. Then 
J, Fg dv = v(E)"! J, F dv = infg(F). If mis less than || F'||,,, then the set E 
where F is > m has positive measure, and the inequality reads m < f, Fg dv 
for the associated g. Hence m < sup, J y Fg dv. Taking the supremum of such 


m’s, we obtain || F'||,, < sup, | Jy Fg dv|, and the reverse inequality is proved. 

The third step is to use the previous two steps to prove the inequality in the 
statement of the theorem for f > 0. Let g be any nonnegative function on Y with 
J ygdv < 1. Then Fubini’s Theorem, the result of the first step above, and the 
result in the easy direction of the second step above give 


Seo ty £@. yy) du] dv) = fy [Jy f@. yg) dv) ] du) 
< fy [IFC Dlloo.aviy] 4200. 


Taking the supremum over g and using the result in the hard direction of the 
second step, we obtain the inequality in the statement of the theorem. 


10. Problems 


1. Let X bea finite set of n > 0 elements. 
(a) If A is an algebra of subsets, what are the possible numbers of sets in A? 
(b) Show that symmetric difference A A B = (A — B) U(B — A) is an abelian 
group operation on the set of all subsets of X and that every nontrivial 
element has order 2. 
(c) If 6 is a class of subsets containing @ and X and closed under symmetric 
difference, what are the possible numbers of sets in B? 
(d) Prove or disprove: The class of sets in (c) is necessarily an algebra of sets. 
(e) Show that intersection and symmetric difference satisfy the distributive law 
AN(BAC)= (AN B)A(ANC). 
2. Exhibit a completely additive set function p on a o-algebra and two sets A and 
B such that p(A) < 0 and p(B) < 0 but p(A U B) > 0. 


3. Let {E,} be a sequence of subsets of X, and put 


CO CO CO (oe) 
A=(\& and B= |) () ke. 


n=1k=n n=1k=n 
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10. 


11. 
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Prove that the indicator functions of E,, A, and B satisfy 


I, =limsup/-, and J/g = liminf/;,. 
n n 


Suppose that jz is a finite measure defined on a o-algebra and {E,,} is a sequence 


of measurable sets with 
oe) oe) oe) CO 
(VU ek =Uf ee 
n=1k=n n=l1k=n 


Call the set on the two sides of this equation EF. Prove that lim, ju(E,,) exists and 
equals u(E). 

Let X be the set of rational numbers, and let 7? be the ring of all finite disjoint 
unions of bounded intervals in X, with or without endpoints. For each set E in 
R, let w(E) be its length. 

(a) Show that yz is nonnegative additive. 

(b) Show that jz is not completely additive. 


Prove that if E is a Lebesgue measurable subset of [0, 1] of Lebesgue measure 0, 
then the complement of E is dense in [0, 1]. 

Let ju be a measure defined on a o-algebra. Prove that if the complement of 
every set of measure +00 is of finite measure, then sup,,(4) 450 H(A) is finite 
and there is a set B with u(B) = SUP 1(A) <+00 L(A). 


If f is a measurable function, prove that f—!(£) is measurable whenever E is a 
Borel subset of the real line. 

For the measure space (X, A, 2) in which X is the positive integers, A consists 
of all subsets of X, and w is the counting measure, the theory of Lebesgue 
integration becomes a theory of infinite series. Restate Fatou’s Lemma and the 
Dominated Convergence Theorem in this context. 


Suppose on a finite measure space that { f;,} is a sequence of real-valued integrable 
functions tending uniformly to f. Prove that lim, fy frdu = fy f du. 


This problem involves a Cantor set C in [0, 1] built using fractions r,, as in Section 

11.9. 

(a) Show that C has Lebesgue measure They (1 —1rp). 

(b) Prove that the indicator function J¢ is discontinuous at every point of C 
and only there. Thus the set of discontinuities of Jc is not of measure 0 if 
TT, d —9rn) > 0. 

(c) Show that if the result of redefining Jc on a set of Lebesgue measure 0 is a 
function f, then the only possible points of continuity of f are those where 
f isO. 

(d) Conclude that there exists a Lebesgue measurable function on [0, 1] that is 
not Riemann integrable and cannot be redefined on a set of measure 0 so as 
to be Riemann integrable. 


12. 


13. 


14. 


15. 


16. 


17. 


18. 
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Let (X, A, 4) be any measure space, and let (X, A, Zt) be its completion. Prove 
that if f is a function measurable with respect to A, then f can be redefined on 
a set of j/-measure 0 so as to be measurable with respect to A. 

Let X be an uncountable set, and let A be the set of all countable subsets of X 
and their complements. Prove that the diagonal { (x, x) | xex } is not a member 
of the o-algebra A x A, the smallest o-algebra containing all rectangles with 
sides in A. 

Let (R}, B,m) be the real line with Lebesgue measure on the Borel sets, and let 
(X, A, ) be a o-finite measure space. If f > 0 is a measurable function on X, 
prove that the “region under the graph of f,” defined by 


R={(,y)|0<y < f@}, 
is a measurable subset of X x R! and that its measure relative to w x m is 
Sy FO) dua). 
Let A beao-algebra of subsets of anonempty set X let F : C"' x---xC™ > CX 
be continuous, and let fj; : X — C” be measurable with respect to A for 
1<j<k. Prove thatx bh F(fi(x),..., f¢(%)) is measurable with respect 
to A. 
This problem complements the proof in Theorem 5.59 that L! is a complete 
metric space. For n > 1, suppose that 0 < a, < | and peer ayn = +oo. Finda 
measure space (X, A, jz) and a sequence of functions f, with || fn ||, = a, and 
{ fr(x)} convergent for no x. 
(Egoroff’s Theorem) Let (X, A, i) be a finite measure space. Suppose that 
fn and f are measurable functions with values in R such that lim f, (x) = f(x) 
pointwise. The objective of this problem is to prove that lim f, = f “almost 
uniformly.” By considering the sets 


Eun = {x € X | |fr(x) — f@)| < 1/M forn > N} 


for M fixed and N varying, prove that if € > 0 is given, then there exists a 

measurable subset E of X with w(E) < € such that lim f,(x) = f(x) uniformly 

for x in E°. 

(a) Derive the Dominated Convergence Theorem for a space of finite measure 
from Egoroff’s Theorem (Problem 17) and Corollary 5.24. 

(b) Derive the Dominated Convergence Theorem for a space of infinite measure 
from the Dominated Convergence Theorem for a space of finite measure. 


Problems 19-21 use Egoroff’s Theorem (Problem 17) to show how close pointwise 
convergence is to L! convergence on a measure space (X, A, j) of finite measure. 
Theorem 5.59 shows that if a sequence converges in L!(X), then a subsequence 
converges almost everywhere. These problems address the converse direction in a 
way different from Problem 16. Suppose that f, and f are integrable functions with 
values in R such that lim f, (x) = f(x) pointwise. 
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19. Suppose that f, > 0 for all n and that lim f, frdu = f, fd. Prove that 
lim, {; fndu = J, f du for every measurable set E. 


20. Suppose that f, > 0 forall n and that lim /, f,du = fy f dw. Use the previous 


problem and Egoroff’s Theorem to prove that lim / ylfn— fldu=O0. 


21. A sequence {g,} of nonnegative integrable functions is called uniformly 
integrable if for any « > 0, there is an N such that See fal>ny Bn du <e€e 
for all n. Suppose that the members of the given convergent sequence { f,,} are 
nonnegative. Using Egoroff’s Theorem in one direction and the previous problem 
in the converse direction, prove that lim, /, y ind = f. y f du if and only if the 
Jn are uniformly integrable. 


Problems 22-24 concern the extension of measures beyond what is given in Theorem 
5.5 and Proposition 5.37. Let yz be a finite measure on a o-algebra A of subsets of X, 
and define ju, and jz* on all subsets of X as in Lemma 5.32 and immediately after it. 
Let E be a subset of X that is not in A, and let B be the smallest o-algebra containing 
E and the members of A. 


22. Show that there exist two sets K and U in A suchthat K C E CU, y,(E) = 
w(K), and u*(E£) = w(U). Show that K and U have the further properties that 
US CES CK*, wp. (E°) = w(U*), and w*(E°) = w(K*). 

23. Show thatthe sets K and U of the previous problem satisfy u,.(ANE) = w(ANK) 
and u*(AN E) = w(ANU) for every A in A. 


24. Fix t in [0, 1]. Show that the set function o defined for A and B in A by 
oL(AN E)U(BN E*)] 
=th(ANE)+ (1 -pnw(ANE)+tw (BNE) +0 —-thy.(Bn E) 


is defined on all of BG, is a measure, agrees with jz on A, and assigns measure 
tu,(E) + (1 — t)u*(E) to the set E. 


Problems 25-33 concern a construction by “transfinite induction” of all sets in the 
smallest o-algebra containing an algebra of sets. In particular, it describes how to 
obtain all Borel sets of the interval [0, 1] of the line from the elementary sets in that 
interval. Later problems in the set apply the construction in various ways. This set of 
problems makes use of partial orderings as described in Section A9 of the appendix, 
but they do not use Zorn’s Lemma. The set of countable ordinals is an uncountable 
partially ordered set Q, under a partial ordering <, with the following properties: 
(i) & has the property that x < y and y < x together imply x = y, 
(ii) Q is “totally ordered” in the sense that any x and y in the set have either 
x<yory<x, 
(iii) Q is “well ordered” in the sense that any nonempty subset has a least element, 
(iv) for any x in Q, the set of elements < x is at most countable. 


Take as known that such a set Q exists. 


29: 
26. 


27. 


28. 


29. 


30. 


31. 
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Prove that any countable subset of Q has a least upper bound. 


This problem asks for a proof of the validity of transfinite induction as applied 
to Q. Let | be the least element of (2, and let “<” mean “< but not =.” Suppose 
that some p(w) is specified for each w in Q. Suppose further that p(1) is true 
and that if for each wm > 1, p(@’) is true for all w’ < w, then p(@) is true. Prove 
that p(@) is true for all w in Q. 


Let X be a nonempty set, let A be an algebra of subsets of X, and let 6 be the 
smallest o-algebra containing A. This problem uses Q to describe “construc- 
tively” B in terms of A. We define by transfinite induction two successively 
larger classes of sets U4, and K, for each countable ordinal a > 1. Let U4; be 
the set of all countable increasing unions of members of A, let K, fora > 1 be 
the set of all countable decreasing intersections of members of U/,, and let 24, for 
a > | be the set of all countable increasing unions of members of previous Kg’s. 
(a) Prove at each stage @ that U4, and K are both closed under finite unions and 
finite intersections. 
(b) Prove that 6 is the union of all K, for a in Q. 


For the case that v(X) < +00, prove the uniqueness half of the Extension 
Theorem (Theorem 5.5) by using the transfinite construction of Problem 27. 
[Educational note: It is not known how to prove the existence half of the Extension 
Theorem in this “constructive” way.] 


Prove the Monotone Class Lemma (Lemma 5.43) by making use of the transfinite 
construction of Problem 27. 


Devise a transfinite construction of all finite-valued Borel measurable functions 
on R! that starts from continuous functions and alternately allows pointwise 
increasing limits and pointwise decreasing limits. The construction is to be in 
the spirit of Problem 27. Show that all finite-valued Borel measurable functions 
are obtained in this way if the indexing is done with Q. 


This problem “counts” the number of Borel sets of the real line, using Problem 27. 
It uses the material on cardinality in Section A10 of the appendix. 
(a) Prove that 


(i) Q has the same cardinality as some subset of R, 

(ii) the set of all sequences of members of R has the same cardinality 
as R, 

(iii) if A C B C C and if A and C have the same cardinality as R, then 
so does B, 

(iv) if aset A has the same cardinality as R and if for each a in A, By 
is a set with the same cardinality as R, then ),., By has the same 
cardinality as R. 


acA 


(b) Deduce that the set of all Borel sets of R has the same cardinality as R itself. 


294 V. Lebesgue Measure and Abstract Measure Theory 


32. The standard Cantor set C in [0, 1], built using fractions r, = 1/3 as in Section 
II.9, is a Borel set of Lebesgue measure 0 by Problem 11. Prove that C has the 
same cardinality as R. Conclude that the cardinality of the set of all Lebesgue 
measurable sets equals the cardinality of the set of all subsets of IR. [Educational 
note: From this and Problem 31 it follows that there exists a Lebesgue measurable 
set in [O, 1] that is not a Borel set.] 

33. For the standard Cantor set C as in the previous problem, show that the indicator 
function Ic’ of any subset C’ of C is continuous on C°. Conclude that the cardi- 
nality of the set of Riemann integrable functions on [0, 1] equals the cardinality 
of the set of all subsets of IR. [Educational note: From this and Problems 30-31, 
it follows that there exists a Riemann integrable function on [0, 1] that is not 
Borel measurable.] 


Problems 34-41 show how to produce nontrivial nonnegative additive set functions on 
the set of all subsets of an infinite set from Zorn’s Lemma (Section A9 of the appendix). 
A filter F on a nonempty set X is a nonempty class of subsets of X such that 
(i) if E isin Fand F D E, then F is in F,i.e., F is closed under the operation 
of forming supersets, 
(ii) if E and F arein F,sois ENF, 
(ili) @ is not in F. 
An ultrafilter is a filter that is not properly contained in any larger filter. 
34. Verify the following: 
(a) {X} isa filter. 
(b) Any filter is closed under finite intersections. 
(c) A one-point set and all of its supersets form an ultrafilter. (Such an ultrafilter 
is called a trivial ultrafilter.) 
(d) If X is infinite, then the set F of all subsets whose complements are finite 
sets is a filter. 


35. Use Zorn’s Lemma to show that every filter is contained in some ultrafilter. 


36. Show that if C is anonempty class of subsets of X, then there is a filter containing 
C if and only if no finite intersection of members of C is empty. 


37. Prove that a filter F is an ultrafilter if and only if A U B in F implies that either 
Aisin For B is inf. 


38. Prove that a filter F is an ultrafilter if and only if forevery A C X, either A is in 
F or AS isin F. 

39. Prove that the nonzero additive set functions defined on the set of all subsets 
of a set X and having image {0, 1} stand in one-one correspondence with the 
ultrafilters on X, the correspondence being that the sets in the ultrafilter are 
exactly the sets on which the set function is |. Prove that the set function is 
a measure if and only if the corresponding ultrafilter is closed under countable 
intersections. 
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40. Let X be any infinite set. Prove that X has a nontrivial ultrafilter, hence that X 
has a nonnegative additive set function jz that assumes only the values 0 and 1 
and is not a point mass. 


41. Prove that the set Z* of positive integers has no nontrivial ultrafilter closed under 
countable intersections, i.e., that the set function jz in the previous problem is 
not a measure. 


Problems 42-43 concern a theory of integration in which complete additivity is 
dropped as an assumption. An example is given in Problems 39-41 of a nonnegative 
additive set function on the set of all subsets of an infinite set that is not completely 
additive. For the present set of problems, let X be a nonempty set, let A be a 
o-algebra of subsets, and let jz be a nonnegative additive set function on A such that 
p(X) < +00. Imagine an integration theory for /, ~ J du with the definitions just 
as in the case that jz is a measure. All the properties of the integral proved in the 
text before the Monotone Convergence Theorem would still be valid, except that the 
integral pf das a function of E would be merely additive, rather than completely 
additive, and hence we would have to drop Corollary 5.24 and the converse half of 
Corollary 5.23. 


42. Let f be > 0, and let s, be the standard pointwise increasing sequence of simple 
functions with limit f, as in Proposition 5.11. Show that the convergence of s, 
to f is uniform if f is bounded. 


43. Use the result of the previous problem to show in this theory that /, pftg)du= 
J, fdut f,gdwuif f and g are bounded and measurable. 


CHAPTER VI 


Measure Theory for Euclidean Space 


Abstract. This chapter mines some of the powerful consequences of the basic measure theory in 
Chapter V. 

Sections 1-3 establish properties of Lebesgue measure and other Borel measures on Euclidean 
space and on open subsets of Euclidean space. The main general property is the regularity of all 
such measures —that the measure of any Borel set can be approximated by the measure of compact 
sets from within and open sets from without. Lebesgue measure in all of Euclidean space has an 
additional property, translation invariance, which allows for the notion of the convolution of two 
functions. Convolution gives a kind of moving average of the translates of one function weighted 
by the other function. Convolution with the dilates of a fixed integrable function provides a handy 
kind of approximate identity. 

Section 4 gives the final form of the comparison of the Riemann and Lebesgue integrals, a 
preliminary form having been given in Chapter III. 

Section 5 gives the final form of the change-of-variables theorem for integration, starting from 
the preliminary form of the theorem in Chapter III and taking advantage of the ease with which 
limits can be handled by the Lebesgue integral. Sard’s Theorem allows one to disregard sets of 
lower dimension in establishing such changes of variables, thereby giving results in their expected 
form rather than in a form dictated by technicalities. 

Section 6 concerns the Hardy—Littlewood Maximal Theorem in N dimensions. In dimension 1, 
this theorem implies that the derivative of a 1-dimensional Lebesgue integral with respect to Lebesgue 
measure recovers the integrand almost everywhere. The theorem in the general case implies that 
certain averages ofa function over small sets about a point tend to the function almosteverywhere. But 
the theorem can be regarded as saying also that a particular approximate identity formed by dilations 
applies to problems of almost-everywhere convergence, as well as to problems of norm convergence 
and uniform convergence. A corollary of the theorem is that many approximate identities formed 
by dilations yield almost-everywhere convergence theorems. 

Section 7 redevelops the beginnings of the subject of Fourier series using the Lebesgue integral, 
the theory having been developed with the Riemann integral in Section 1.10. With the Lebesgue 
integral and its accompanying tools, Fourier series are meaningful for more functions than before, 
Dini’s test applies even to a wider class of Riemann integrable functions than before, and Fejér’s 
Theorem and Parseval’s Theorem become easier and more general than before. A completely new 
result with the Lebesgue integral is the Riesz—Fischer Theorem, which characterizes the trigonometric 
series that are Fourier series of square-integrable functions. 

Sections 8-10 deal with Stieltjes measures, which are Borel measures on the line, and their 
application to Fourier series. Such measures are characterized in terms of a class of monotone 
functions on the line, and they lead to a handy generalization of the integration-by-parts formula. 
This formula allows one to bound the size of the Fourier coefficients of functions of bounded variation, 
which are differences of monotone functions. In combination with earlier results, this bound yields 
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the Dirichlet-Jordan Theorem, which says that the Fourier series of a function of bounded variation 
converges pointwise everywhere, the convergence being uniform on any compact set on which the 
function is continuous. Section 10 is a short section on computation of integrals. 


1. Lebesgue Measure and Other Borel Measures 


Lebesgue measure on R! was constructed in Section V.1 on the ring of 
“elementary” sets —the finite disjoint unions of bounded intervals — and extended 
from there to the o-algebra of Borel sets by the Extension Theorem (Theorem 
5.5), which was proved in Section V.5. Fubini’s Theorem (Theorem 5.47) would 
have allowed us to build Lebesgue measure in R% as an iterated product of 
1-dimensional Lebesgue measure, but we postponed the construction in RY 
until the present chapter in order to show that it can be carried out in a fashion 
independent of how we group 1-dimensional factors. 

The Borel sets of R! are, by definition, the sets in the smallest o-algebra 
containing the elementary sets, and we saw readily that every set that is open 
or compact is a Borel set. We write B, for this o-algebra. In fact, 6; may 
be described as the smallest o-algebra containing the open sets of R! or as the 
smallest o-algebra containing the compact sets. The reason that the open sets 
generate B, is that every open interval is an open set, and every interval is a 
countable intersection of open intervals. Similarly the compact sets generate B, 
because every closed bounded interval is a compact set, and every interval is the 
countable union of closed bounded intervals. 

Now let us turn our attention to RY” . We have already used the word “rectangle” 
in two different senses in connection with integration—in Chapter III to mean an 
N-fold product along coordinate directions of open or closed bounded intervals, 
and in Chapter V to mean a product of measurable sets. For clarity let us refer to 
any product of bounded intervals as a geometric rectangle and to any product of 
measurable sets as an abstract rectangle or an abstract rectangle in the sense of 
Fubini’s Theorem. In R%, every geometric rectangle under our definition is an 
abstract rectangle, but not conversely. 

Define the Borel sets of R% to be the members of the smallest o-algebra By 
containing all compact sets in R%. It is equivalent to let By be the smallest 
o-algebra containing all open sets. In fact, every open geometric rectangle is the 
countable union of compact geometric rectangles, and every open set in turn is 
the countable union of open geometric rectangles; thus the open sets are in the 
smallest o-algebra containing the compact sets. In the reverse direction every 
closed set is the complement of an open set, and every compact set is closed; thus 
the compact sets are in the smallest o-algebra containing the open sets. 

Functions on R% measurable with respect to By are called Borel measurable 
functions or Borel functions. Any continuous real-valued function f on R% 
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is Borel measurable because the inverse image f —l((c, +00]) of the open set 
(c, +00] has to be open and therefore has to be a Borel set. 


Proposition 6.1. If m and n are integers > 1, then B,, x B, = By+ within 
the product set R” x R? = R™*". 


ProorF. If U is open is R” and V is open in R", then U x V is open in R”*”, 
and it follows that B,, x B, C Bn. For the reverse inclusion, let W be open 
in R”*". Then W is the countable union of open geometric rectangles, and each 
of these is of the form U x V with U open in R” and V open in R”. Since 
each such U x V is in By, x B,, so is W. Thus we obtain the reverse inclusion 


Bm+n © Bm x Bn. 


Lebesgue measure on RY will, at least initially, be a measure defined on the 
o-algebra By. Proposition 6.1 tells us that the o-algebra on which the measure is 
to be defined is independent of the grouping of variables used in Fubini’s Theorem. 
It will be quite believable that different constructions of Lebesgue measure by 
using different iterated product decompositions of R%, such as (R! x R') x R! 
and R! x (R! x R!), will lead to the same measure, but we shall give two abstract 
characterizations of the result that will ensure uniqueness without any act of 
faith. These characterizations will take some moments to establish, but we shall 
obtain useful additional results along the way. The procedure will be to state the 
constructions of the measure via Fubini’s Theorem, then to consider a wider class 
of measures on By known as the “Borel measures,” and finally to establish the 
two characterizations of Lebesgue measure among all Borel measures on R%. 

It is customary to write dx in place of dm(x) for Lebesgue measure on R!, and 
we shall do so except when there is some special need for the symbol m. Then the 
notation for the measure normally becomes an expression like dx or dy instead 
of m. To construct Lebesgue measure dx on R™, we can proceed inductively, 
adding one variable at atime. Fubini’s Theorem allows us to construct the product 
of Lebesgue measure on R‘~! and Lebesgue measure on R!, and Proposition 
6.1 shows that the result is defined on the Borel sets of R‘. Let us take this 
particular construction as an inductive definition of Lebesgue measure on RY’. 
It is apparent from the construction that the measure of a geometric rectangle is 
the product of the lengths of the sides. 

Alternatively, we could construct Lebesgue measure on R% inductively by 
grouping R% as some other R” x R‘~” and using the product measure from 
versions of the Lebesgue measures on R” and R‘~”. Again the result has the 
property that the measure of a geometric rectangle is the product of the lengths of 
the sides. It is believable that this condition determines completely the measure 
on RN, and we shall give a proof of this uniqueness shortly. 
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A Borel measure on R% is a measure on the o-algebra By of Borel sets of 
IR that is finite on every compact set. A key property of Borel measures on R™ 
is their regularity as expressed in Theorem 6.2 below. The theorem makes use of 
two simple properties of R™: 


(i) there exists a sequence {F;,}°° , of compact sets with union the whole 
space such that F,, C Frey for all n, 

(ii) for any compact set K’, there exists a decreasing sequence of open sets 
U,, with compact closure such that C4 U, = K. 


For (i), we can take F, to be the closed ball of radius n centered at the origin. 
For (ii), we can take U, = {x | D(x, K) < 1/n} if K # @, and we can take all 
U,=SGifK=2. 


Theorem 6.2. Every Borel measure jz on RY is regular in the sense that the 
value of jz on any Borel set E is given by 


W(E) = sup w(K) = inf w(U). 
KCE, URE, 
K compact U open 


REMARK. This conclusion is new for us even for R!. Although regularity of 
1-dimensional Lebesgue measure was introduced before Proposition 5.4, it was 
established only for the elementary sets at that time. 


PROOF. We shall begin by showing for each Borel set F and for any € > O that 


there exist closed C and open U such that (%) 
CCE CU and uU —C) <e. 


Let A be the set of Borel sets E for which (*) holds for all « > 0. 

If E is compact, then we can take C = E and U = U, as in (ii) for a suitable 
n in order to prove (); Corollary 5.3 gives us lim, w(U;, — C) = 0, since the 
compact closure of U,, forces z(U;) to be finite. Therefore A contains all compact 
sets. 

To see that A is closed under complements, suppose E is in A. Lete > 0 
be given and choose, by («) for E, a closed set C and an open set U such that 
CCE CU and w(U —C) < €. Taking complements, we have US C ES CC® 
and w(C® — US) = wU — C) < e. Thus E* is in A. 

Let us see that A is closed under finite unions. Suppose that E, and E> are 
in A. Let € > 0 be given and choose, by («) for E; and E>, two closed sets C; 
and C2 and two open sets U; and U2 such that C; C FE; C Uj, w(U1 — C1) < €, 
C2 C Ey C Uo, and w(U2 — C2) < €. Then C; UC2 C FE, UE2 CU; UU 2 
and w((U; U U2) — (C; UC2)) < WU, — C1) + w(U2 — C2) < 2e. Since € is 
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arbitrary, E; U E> is in A. Hence A is closed under finite unions, and A is an 
algebra of sets. 

The proof that A is closed under countable unions takes two steps. For the 
first step we let a sequence of sets E, in A be given with union E, and first 
assume that all E,, lie in one of the sets Fy in (i) above. Let € > 0 be given 
and choose, by (*) for each E,,, closed sets C, and open sets U,, such that C, C 
En © Uy and “(Un — Cy) < €/2”. Possibly by intersecting U, with Fyyy> we 
may assume that all U,, lie in the compact set Fy4,. Set U = CB ace U,, and 
C = Ue, Cn. Then C C E C U with U open but C not necessarily closed. 
Nevertheless, we have U — C C et (U, — C,), and Proposition 5.1g gives 
w(U —C) < 2, wWUn — Cr) < €. The sets S, = U —(J",C, forma 
decreasing sequence within Fy4, with intersection U — C. Since w(Fy+1) is 
finite, Corollary 5.3 shows that 1(S,,) decreases to w(U — C), which is < €. 
Thus there is some m = mo with (Sm) < €. The set C’ = ha eae C,, is closed, 
and we have C’ C E C U and w(U — C’) = w(S»,) < €. Therefore E is in A. 

For the second step we let the sets E,, be general members of A. Since A is an 
algebra, En 1 (Fin4i1 — Fm) isin A for every n andm. Applying the previous step, 
we see that E}, = EN (Fin41 — Fin) isin A for every m. The sets E’, have union 
E, and E’, is contained in F,,4 — F,. Changing notation, we may assume that the 
given sets E,, all have FE, C F,4, — F,. Ife > Ois given, construct U, open and 
C,, closed as in the previous paragraph except that U,, is not constrained to lie ina 
particular Fy. Again let U =U, U, and C = U™, Cn, so that C C E CU 
and 4(U — C) < €. The set U is open, and this time we can prove that the set C 
is closed. In fact, let {x,} be a sequence in C convergent to some limit point x9 
of C. The point xo is in some F'y since the sets Fy have union the whole space. 
Since Fy C Fy,, and Fy,,, is open, the sequence is eventually in aver The 
inclusion C, C E, © Fry; — F, shows that C, 1 Fy4,) = S forn > M+1. 
Thus no term of the sequence after some point lies in Cy41, Cu42,.-.., ie., all 
the terms of the sequence after some point lie in ise Cy. This is a closed set, 
and the limit xo must lie in it. Therefore x9 lies in C, and C is closed. This proves 
that E is in A. Hence A is a o-algebra and must contain all Borel sets. 

From (x) for all Borel sets, it follows that every Borel set E satisfies 


W(E)= sup w(C)= inf WU). (4) 
CCE, UDE, 
C closed U open 


Proposition 5.2 shows that the sets F,, of (i) have the property that w(C) = 
sup (CN F,,) for every Borel set CC. When C is closed, the sets CN F;, are compact, 
and thus («*) implies the equality asserted in the statement of the theorem. This 
completes the proof. 


Recall from Section III.10 that the support of a scalar-valued function on a 
metric space is the closure of the set where it is nonzero. Let C com(R™ ) be the 
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space of continuous scalar-valued functions on R” of compact support. If there 
is no special mention of the scalars, the scalars may be either real or complex. 

If K is a compact set and the open sets U,, are as in (ii) before Theorem 6.2, 
Proposition 2.30e gives us continuous functions fh : IR” — [0, 1] such that f, 
is 1 on K and is 0 on US. The support of the function f; is then contained in U<', 
which is compact. By replacing the functions f, by gn = min{fi,..., fn}, we 
may assume that they are pointwise decreasing. Consequently 

(iii) there exists a decreasing sequence of real-valued members of Ceom(R” ) 

with pointwise limit the indicator function of K. 


Corollary 6.3. If 4. and v are Borel measures on R% such that fev fdu = 
Jv f dv for all continuous functions on IR of compact support, then w = v. 


PROOF. Let K be a compact subset of RY , and use (iii) to choose a decreasing 
sequence { f,} of real-valued members of Ccom(R™) with pointwise limit the 
indicator function x. Since f; is integrable, dominated convergence allows us 
to deduce fev Ik du = fon Ix dv from the equality fon frdu = fen fndv for 
all n. Thus w(K) = v(K) for every compact set K. Applying Theorem 6.2, we 
obtain “(E£) = v(£) for every Borel set E. 


Corollary 6.4. Let p = 1 or p = 2. If is a Borel measure on R” , then 
(a) Ccom(R) is dense in L?(R, pt), 
(b) the smallest closed subspace of L? (IR, jz) containing all indicator func- 
tions of compact sets in R is L? (RN, yz) itself. 


REMARK. The scalars are assumed to be the same for Ceom(R™ ) as for 
L'(R, w) and L?(R%, w); the corollary is valid both for real scalars and for 
complex scalars. 


PROOF. If EF is a Borel set of finite 2 measure and if € is given, Theorem 6.2 
allows us to choose a compact set K with K C E and w(E — K) < e. Then 
Sea |e —Ix|? du = u(E—K) < €,and consequently the closure in L? (R’) of 
the set of all indicator functions of compact sets contains all indicator functions 
of Borel sets of finite 4. measure. Proposition 5.56 shows consequently that the 
smallest closed subspace of L? (IR¥ ) containing all indicator functions of compact 
sets is L?(R) itself. This proves (b). 

For (a), let K be compact, and use (iii) to choose a decreasing sequence 
{fr} of real-valued members of C.om(R”) with pointwise limit Ix. Since ee is 
integrable, dominated convergence yields lim, Jpn lfn — Ix |? du = 0. Hence 
the closure of Coom(R™) in L?(R) contains all indicator functions of compact 
sets. By Proposition 5.55d this closure contains the smallest closed subspace of 
L? (R%) containing all indicator functions of compact sets. Conclusion (b) shows 
that the latter subspace is L? (IR™) itself. This proves (a). 
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Fix anintegern > 0,and let (a;,..., ay) be ann-tuple of integers. The diadic 
cube Q,,(a;,..., ay) in R® of side 2~” is defined to be the geometric rectangle 


On(a1,...,4w) = {(11,..., Xn) | 2 "aj < xj <2" (aj +1) forl <j < N}. 


Let Q,, be the set of all diadic cubes of side 2~”. The members of Q,, are disjoint 
and have union R". Thus we can associate uniquely to each x in R% a sequence 
{Q,} of diadic cubes such that x is in Q, and Q, is in Q,. Since for each n, 
the members of Q,,,; are obtained by subdividing each member of Q, into 2% 
disjoint smaller diadic cubes, the diadic cubes Q,, associated to x must have the 
property that Q, > Qn41 for alln > 0. 


Lemma 6.5. Any open set in R% is the countable disjoint union of diadic 
cubes. 


PRroor. Let an open set U be given. We may assume that U # RN, so that 
US # @. We describe which diadic cubes to include in a collection A so that A 
has the required properties. If x is in U, then D(x, U°) = d is positive since U° 
is closed and nonempty. Let {Q,,} be the sequence of diadic cubes associated to 
x. The distance between any two points of Q,, is < 2~"./N, and this is < d if n is 
sufficiently large. Hence Q, is contained in U for n sufficiently large. The cube 
in A that contains x is to be the Q, with n as small as possible so that QO, C U. 

The construction has been arranged so that the union of the diadic cubes in A 
is exactly U. Suppose that Q and Q’ are members of A obtained from respective 
points x and x’ in U. If QN Q’ S @, let x” be in the intersection. Then Q 
and Q’ are two of the diadic cubes in the sequence associated to x”, and one has 
to contain the other. Without loss of generality, suppose that Q > Q’. Then x’ 
lies in Q as well as Q’, and we should have selected Q for x’ rather than Q’ if 
Q # Q’. We conclude that Q = Q’, and thus the members of A are disjoint. 
Each collection Q, is countable, and therefore the collection A is countable. 


Proposition 6.6. Any Borel measure on R% is determined by its values on all 
the diadic cubes. 


REMARK. We shall apply this result in the present section in connection with 
Lebesgue measure on RY and in Section 8 in connection with general Borel 
measures on R!. 


PROOF. The values on the diadic cubes determine the values on all open sets 
by Lemma 6.5, and the values on all open sets determine the values on all Borel 
sets by Theorem 6.2. 
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Corollary 6.7. There exists a unique Borel measure on R™ for which the 
measure of each geometric rectangle is the product of the lengths of the sides. 
The measure is the N-fold product of 1-dimensional Lebesgue measure. 


REMARKS. The uniqueness is immediate from Proposition 6.6. The first 
version of Lebesgue measure that we constructed has the property stated in the 
corollary and therefore proves existence. All the other versions of Lebesgue 
measure we constructed have the same property, and so all such versions are 
equal. The corollary therefore allows us to use Fubini’s Theorem for any decom- 
position RY = R” x R" with m +n = N. As in the 1-dimensional case, we 
shall often write dx for Lebesgue measure. 


Corollary 6.7 gives one characterization of Lebesgue measure. We shall use 
Proposition 6.6 to give a second characterization, which will be in terms of 
translation invariance. 


Proposition 6.8. Under a Borel function F : RY > R’, F~!(£) is in By 
whenever £ is in By:. In particular, this conclusion is valid if F is continuous. 


PROOF. The set of E’s for which F~!(E) is in By isa o-algebra, and the result 
will follow if this set of E’s contains the open geometric rectangles of R’. If F; 
denotes the j‘" component of F, then F; : RY — R! is Borel measurable and 
Proposition 5.6c shows that F;-1(Uj) is a Borel set in R% if U; is open in R!. 


Then F-!(U, x --- x Un’) = Ans F;—'(U;) is a Borel set in RY. 
Corollary 6.9. Any homeomorphism of R carries By to By. 


Corollary 6.9 is a special case of Proposition 6.8. The particular homeomor- 
phisms of interest at the moment are translations and dilations. Translation by 
Xo is the homeomorphism 1,,(x) = x + Xo. Its operation on a set E is given by 
Ty) (E) = {Tt (x) | x € E} = {x + x9 | x € E} = E+ x0, and its operation on a 
function f on RY is given by Tx (ff) = (GO) = f(x —Xo). Its operation 
on an indicator function /g is T,,(e)(«) = g(x — X09) = Te4x) (x) = Ty, (E)(X)- 
Because of Corollary 6.9, translations operate on measures, the formula being 
Tx) (U)(E) = (Ty, '(E)); since homeomorphisms carry compact sets to compact 
sets, the right side is a Borel measure if jz is a Borel measure. The actions of T,, 
on functions and measures are related by integration. If f > 0 isa Borel function, 
then so is T,,(f), and few f(t) = fen Hop) de; this formula is verified 
by checking it for indicator functions and then passing to simple functions > 0 
by linearity and to Borel functions f > 0 by monotone convergence. 

Dilation 5, by anonzero real c is given on members of RY by 5-(x) = cx, and 
the operations on sets, functions, indicator functions, and measures are analogous 
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to the corresponding operations for translations. Although dilations will play a 
recurring role in this book, the notation 6, will be used only in the present section. 


Theorem 6.10. Lebesgue measure m on R% is translation invariant in the 
sense that t,,(m) = m for every xo in IR’. In fact, Lebesgue measure is the 
unique translation-invariant Borel measure on R% that assigns measure | to the 
diadic cube Qo(0,...,0). The effect of dilations on Lebesgue measure is that 
5e(m) = |c| Nm, ie., fon f (cx) dx = |cl-% fan f (x) dx for every nonnegative 
Borel function f. 


REMARKS. From one point of view, translation and dilation are examples 
of bounded linear operators on each L?(R%, dx), with translation preserving 
norms and with dilation multiplying norms by a constant depending on p and 
the particular dilation. From another point of view, translation and dilation are 
especially simple examples of changes of variables. Operationally the theorem 
allows us to write dy = dx when y = 1T,,(x) and dz = \c|% dx when z = cx. 
These effects of translations and dilations on integration with respect to Lebesgue 
measure are special cases of the general change-of-variables formula to be proved 
in Section 5. 


PROOF. For any xo in R%,, m and t,,(m) assign the product of the lengths of 
the sides as measure to any diadic cube. From Proposition 6.6 we conclude that 
m = T,,(m). The assertion about the effect of dilations on Lebesgue measure is 
proved similarly. 

We still have to prove the uniqueness. Let jz be a translation-invariant Borel 
measure. The members of Q,, are translates of one another and hence have equal 
jf measure. The members of Q,4; are obtained by partitioning each member 
of Q, into 2" members of Q,,,; that are translates of one another. Thus the pz 
measure of any member of Q,,,; is 2~™ times the 4 measure of any member of 
Q,,. Consequently the jz measure of any diadic cube is completely determined 
by the value of 4 on Qo(0,...,0), which is a member of Qo. The uniqueness 
then follows by another application of Proposition 6.6. 


For a continuous function on a closed bounded interval, it was shown at the 
end of Section V.3 that the Riemann integral equals the Lebesgue integral. The 
next proposition gives an N-dimensional analog. A general comparison of the 
Riemann and Lebesgue integrals will be given in Section 4. 


Proposition 6.11. Fora continuous function on acompact geometric rectangle, 
the Riemann integral equals the Lebesgue integral. 


PROOF. The two are equal in the 1-dimensional case, and the N-dimensional 
cases of each may be computed by iterated 1-dimensional integrals —as a result of 
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Corollary 3.33 in the case of the Riemann integral and as a result of the definition 
of Lebesgue measure as a product and the use of Fubini’s Theorem (Theorem 
5.47) in the case of the Lebesgue integral. 


So far, we have worked in this section only with Lebesgue measure on the 
Borel sets. The Lebesgue measurable sets are those sets that occur when 
Lebesgue measure is completed. The Lebesgue measurable sets of measure 0 
are of particular interest. In Section III.8 we defined an ostensibly different 
notion of measure 0 by saying that a set in R™ is of measure 0 if for any € > 0, 
it can be covered by a countable set of open geometric rectangles of total volume 
less than €, and Theorem 3.29 characterized the Riemann integrable functions on 
a compact geometric rectangle as those functions whose discontinuities form a 
set of measure 0 in this sense. Later, Proposition 5.39 showed for R! that a set 
has measure 0 in this sense if and only if it is Lebesgue measurable of Lebesgue 
measure 0. This equivalence extends to RY, as the next proposition shows. 


Proposition 6.12. In R%, the Lebesgue measurable sets of measure 0 are 
exactly the subsets E of R% with the following property: for any « > 0, the set 
E can be covered by countably many geometric rectangles of total volume less 
than e. 


PROOF. Let m be Lebesgue measure on R%. If E has the stated property, let 
E,, be the union of the given countable collection of geometric rectangles of total 
volume < 1/n used to cover E. Proposition 5.1g shows that m(E,) < 1/n, and 
hence the Borel set E’ = (), Ex has m(E’) < 1/n for every n. Therefore 
m(E') = 0. Since E C E’, E is Lebesgue measurable and has Lebesgue 
measure 0. 

Conversely if E is Lebesgue measurable of Lebesgue measure 0 and if € > 0 
is given, we are to find a union of open geometric rectangles containing E and 
having total volume < ¢€. Find a set E’ in By with E C E’ and m(E’) = 0. It is 
enough to handle E’. Writing R% as the union of compact geometric cubes C,, 
of side 2n centered at the origin and covering E’ 1 C,, up to €/2”, we see that we 
may assume that E’ is bounded, being contained in some cube Cy. 

Within R! MN [—n, n], we know that the set of finite unions of intervals is an 
algebra AY of sets such that Be = B, M [—n,n] is the smallest o-algebra 
containing Ans Applying Proposition 5.40 inductively, we see that the set of 
finite disjoint unions of N-fold products of members of AP is an algebra AY 
and then Proposition 6.1 shows that the smallest o-algebra containing Ay is 
BY = By NCy. Proposition 5.38 shows that the measure m on BY is given by 


m*, where m*(A) is the infimum of countable unions of members of AY that 
cover A. Consequently the subset E’ of C, can be covered by countably many 
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geometric rectangles of total volume < €. Doubling these rectangles about their 
centers and discarding their edges, we obtain a covering of E’ by open rectangles 
of total volume < 2", and we have the required covering. 


Borel measurable sets have two distinct advantages over Lebesgue measurable 
sets. One advantage is that Borel measurable sets are independent of the particular 
Borel measure in question, whereas the sets in the completion of a o-algebra 
relative to a Borel measure very much depend on the particular measure. The other 
advantage is that Fubini’s Theorem applies in a tidy fashion to Borel measurable 
functions as a consequence of the identity B,, x By, = Bn+n given in Proposition 
6.1. By contrast, there are Lebesgue measurable sets for R% that are not in the 
product of the o-algebras of Lebesgue measurable sets from R” and RY~”. For 
example, take aset E in R! that is not Lebesgue measurable; such a set is produced 
in Problem 1 at the end of the present chapter. Then E x {0} in R? is a subset 
of the Borel set R! x {0}, and hence it is Lebesgue measurable of measure 0. 
However, E x {0} is not in the product o-algebra, because a section of a function 
measurable with respect to the product has to be measurable with respect to the 
appropriate factor (Lemma 5.46). 

On the other hand, Lebesgue measurable functions are sometimes unavoidable. 
An example occurs with Riemann integrability: In view of Proposition 6.12, 
Theorem 3.29 says that the Riemann integrable functions on a compact geometric 
rectangle are exactly the functions whose discontinuities form a Lebesgue mea- 
surable set of Lebesgue measure 0, and Problems 31-33 at the end of Chapter V 
produced such a function in the 1-dimensional case that is not a Borel measurable 
function. 

The upshot is that a little care is needed when using Fubini’s Theorem and 
Lebesgue measurable sets at the same time, and there are times when one wants 
to do so. The situation is a little messy but not intractable. Problem 12 at the 
end of Chapter V showed that a Lebesgue measurable function can be adjusted 
on a set of Lebesgue measure 0 so as to become Borel measurable. Using this 
fact, one can write down a form of Fubini’s Theorem for Lebesgue measurable 
functions that is usable even if inelegant. 


2. Convolution 


Convolution is an important operation available for functions on R” . Ona formal 
level, the convolution f * g of two functions f and g is 


(f*g)(x)= L f(x — y)g(y) dy. 
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One place convolution arises is as a limit of a linear combination of translates: 
We shall see in Proposition 6.13 that the convolution at x may be written also 
as Spx fog — y) dy. If f is fixed and if finite sets of translation operators 
Ty, and of weights f();) are given, then the value at x of the linear combination 
>=; f i) ty, applied to g and evaluated at x is )0; f (vi) g(x — yi). Corollary 6.17 
will show a sense in which we can think of Sen fog — y) dy as a limit of 
such expressions. 

To make mathematical sense out of f * g, let us begin with the case that f and g 
are nonnegative Borel functions on R. The assertion is that f * g is meaningful 
as a Borel function > 0. In fact, (x, y) b f(x — y) is the composition of 
the continuous function F : R?% — RY given by F(x, y) = x — y, followed 
by the Borel function f : RY — [0,+00]. If U is open in [0, +00], then 
f—!(U) is in By, and Proposition 6.8 shows that (fo F)~'(U) = F~'(f7!(U)) 
is in Byy. Then the product (x, y)  f(* — y)g(y) is a Borel function, and 
Fubini’s Theorem (Theorem 5.47) and Proposition 6.1 combine to show that 
xt (f * g)(x) is a Borel function > 0. 


Proposition 6.13. For nonnegative Borel functions on RY, 


(a) fxeg=erf, 
(b) f *(g *h) = (f *g) xh. 


PROOF. We use Theorem 6.10 for both parts and also Fubini’s Theorem for 
(b). For (a), the changes of variables y + y+ x and then y b —y give 


Saw f(x — y)gy)dy = fan f(—y)g(y +x) dx = fan f(v)g(e — y) dy. For 
(b), the computation is 
(f *(g *h))(x) = few f(x — y)(g * A)(y) dy 
= fev [ few f(x — y)g(y — z)h(z) dz] dy 
= fan [Jan f& — yay — z)h(z) dy] dz 
= fav [fan f —z—y)g(y)h(z) dy] dz 
= fan (f * g(x — zh(z)dz = ((f * g) *A)(x), 


the change of variables y +> y + z being used for the fourth equality. 


In order to have a well-defined expression for f * g when f and g are not 
necessarily > 0, we need conditions under which the nonnegative case leads to 
something finite. The conditions we use ensure finiteness of (| f| * |g|)(x) for 
almost every x. For real-valued f and g, we then define f * g(x) by subtraction 
at the points where (| f| * |g|)(x) is finite, and we define it to be 0 elsewhere. 
For complex-valued f and g, we define (f *« g)(x) as a linear combination of the 
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appropriate parts where (| f | * |g|)(x) 1s finite, and we define it to be 0 elsewhere. 
When we proceed this way, the commutativity and associativity properties in 
Proposition 6.13 will be valid even though f and g are not necessarily > 0. 


Proposition 6.14. For nonnegative Borel functions f and g on R”, convo- 
lution is finite almost everywhere in the following cases, and then the indicated 
inequalities of norms are satisfied: 

(a) for f in L'(R%) and g in L'(R™), and then || f * gil, < If llilgll,. 
(b) for f in L'(R) and g in L7(R%), and then || f * gil, < IIflliliglls- 
for f in L?(R%) and g in L'(RY), and then || f * gil, < If llailgll,. 
(c) for f in L'(R) and g in L*(R™), and then || f * glloo < If llillZlloo> 
for f in L(R™) and g in L'(R™), and then || f * gllo < If llollgll. 
(d) for f in L*(R%) and g in L?(R), and then || f * gil. < If llallgll- 
Consequently f *« g is defined in the above situations even if the scalar-valued 
functions f and g are not necessarily > 0, and the estimates on the norm of f * g 
are still valid. 


PROOF. For (a) and the first conclusions in (b) and (c), let p be 1, 2, or oo as 
appropriate. By Minkowski’s inequality for integrals (Theorem 5.60), 


If * all, = Sev FON8@ —y) dy]. < Sev IFOV8@ — Ilys dy 
= fan IF ODI — YMllp.e dy = Jan [FOI Mle dy = WF lillslly: 


the next-to-last equality following from the translation invariance of dx. The 
second conclusions in (b) and (c) require only notational changes. 
For (d), we have 


sup |(f * g)(x)| = sup, | few fg — y)dy| 


< sup, If llgllg¢@ — lle y = WF llells lle» 
the inequality following from the Schwarz inequality and the last step following 
from translation invariance of dy and invariance under y > —y. 
Going over these arguments, we see that we may use them even if f and g are 
not necessarily > 0. Then the last statement of the proposition follows. 


Next let us relate the translation operators of Section 1 to convolution. The 
formula for the effect of a translation operator ona function is t;(f)(x) = f(x—t). 


Proposition 6.15. Convolution commutes with translations in the sense that 
u(f*eg=(uf)*ga=f*tg. 
PROOF. It is enough to treat functions > 0. Then we have 1,(f * g)(x) = 


(fxg)(x—-t) = fon f (x—t—y)g(y) dy, which equals f(t f)(x—y)g(y) dy = 
((t%; f) * g)(x) on the one hand and, because of translation invariance of Lebesgue 
measure, equals Jpn f( —y)g@ —t)dy = (f * %g)(x) on the other hand. 
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Proposition 6.16. If p = 1 or p = 2, then translation of a function is 
continuous in the translation parameter in L? (R% , dx). In other words, if f is in 
LP relative to Lebesgue measure, then limy,_,9 || T+, — t; f'|| >= 0 for all tf. 


REMARK. However, continuity fails on L°®. In this case, there is a substitute 
result, and we take that up in a moment. 


PRooF. Let f be in L?. By translation invariance of Lebesgue measure, 
Itanf —tfll, = laf — filp. If g is in Coom(R¥), then [Img — gll> = 
San |g(x —h) — g(x)|? dx, and dominated convergence shows that this tends to 0 
as h tends to 0. Let € > 0 and f be given. By Corollary 6.4a, Ccom(R’ ) is dense 
in L?(RN, dx), and thus we can choose g in Ccom(R™) with || f — g||, < €. Then 


taf — fllp <ltf — mally + Ime — sll, tle — fil, 
=2If —gll, time —gll, <2€+ Img —gll,- 


If h is close enough to 0, the term Iti — gl, is < €, and then ||t, f — Filly < 3e. 


Corollary 6.17. Let p = 1 or p = 2, and let gi,..., g, be finitely many 
functions in L?(R). If a positive number € and a function f in L'(RY) are 
given, then there exist finitely many members y; of RY ,1 < j <n,and constants 
c; such that || f * ge — 7} cjty, 8 I, <eforl <k<r. 


REMARK. In the case r = 1, the corollary says that any convolution f * g can 
be approximated in L? by a linear combination of translates of g. The result will 
be used in Chapter VII with r > 1. 


Proor. Let V be the set of functions f in L'(R™) for which this kind of 
approximation is possible for every € > 0. The main step is to show that V 
contains the indicator functions of the compact sets in RY. Let K be compact, 
and let [x be its indicator function. Proposition 6.16 shows that the functions 
y > Tyg, are continuous from K into L? (R”) for 1 < k <r, and therefore 
these functions are uniformly continuous. Fix € > 0, and let 5 > 0 be such that 
IZy 8% — Ty Sell, < € for all k whenever |y — y’| < 6 and y and y’ are in K. For 
each y in K,, form the open ball B(5; y) in R. These balls cover K , and finitely 
many suffice; let their centers be y1,..., Yn. Define sets S),..., S, inductively 
as follows: 5; is the subset of K where |y — y;| < 6 but |y — y;| > 6 fori < j. 
Then K = Uj-1 5S; disjointly. By the choice of 5, we have ||ty gx — Ty,8ll, < € 
for all y in S; and all k. Using Minkowski’s inequality for integrals (Theorem 
5.60), and writing m for Lebesgue measure, we have 


| Zs; * 8x — (Sty, 8e|,, = Ss, GeO — ) — see — »)) dy], 


< fs, llge@e — y) — gee — y)Ilp.x dV 
< em(S;). 
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Summing over j gives 
n 
[2x * ge — DOj_, m(Sj) ty, 8k I, <em(K). 


Since € is arbitrary, /x lies in V. 

If f; and f2 are in V and if g),..., g, are given, then we may assume, by 
taking the union of the sets of members y; of R™ and by setting any unnecessary 
constants c; equal to 0, that the translates used for f; and f2 with the same 
€ > 0 are the same. Thus we can write fi * BE ae Cj Ty; Sk I, < €/2 and 


fo * Be ae djty,8x\,, < ¢€/2 for suitable y;’s and c;’s, and the triangle 


inequality gives | (fi + fo) * gk — DG: + dj) Ty, 8k I, < €. Hence V is 
closed under addition. Similarly V is closed under scalar multiplication. If 
fi > f in L' with f; in V and if € > 0 is given, choose / large enough so that 


If — filly < €/(2 max | gll,)- If || fi * ge - a= cto 8k, < €/2, then the 
inequality || f * gx — fi * Selly = lf - Sill llgell, and the triangle inequality 


together give | SF *¥SK- eS ch Ty 8k I, < €. Hence f isin V, and V is closed. 
s J 


By Corollary 6.4b, V = L'(R™), and the proof is complete. 


In some cases with L™ (IR), results have more content when phrased in terms 
of the supremum norm || f Ilsup = SUP, eRN | f (x)| defined in Section V.9. For a 
continuous function f, the two norms agree because the set where | f (x)| > M@ 
is open and therefore has positive measure if it is nonempty. For a bounded 
function f, the condition limp-,o ||t, f — f lhe = 0 is equivalent to uniform 
continuity of f, basically by definition. The functions f in L° for which 
limy—o ||t2 f — f |l,5 = 0 are not much more general than the bounded uniformly 
continuous functions; we shall see shortly that they can be adjusted on a set of 
measure 0 so as to be bounded and uniformly continuous. 


Proposition 6.18. In R“ with Lebesgue measure, the convolution of L! with 
L®™, or of L® with L!, or of L? with L? results in an everywhere-defined bounded 
uniformly continuous function, not just an L° function. Moreover, 


IF*Sllsup SMF Ui Msllocs WF *Sllsup SMF loll, or IF*8llsup S ll fllallgll. 


in the various cases. 


PROOF. We give the proof when f is in L! and g is in L™, the other cases 
being handled similarly. The bound follows from the computation || f Elles = 


sup, | few f(x — yg(y) dy| < sup, IIZlloo aw If — y)ldy = If ll llglloo- 
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For uniform continuity we use Proposition 6.15 and the bound || f * glu) < 


fll, llgll, to make the estimate 


Ita F * 8) — CF * 8) lsup = af) * 8 — F * 8ilsup 
= Ia f — f) * 8llsup S lltaf — fllillglloo: 


and then we apply Proposition 6.16 to see that the right side tends to 0 as h tends 
to 0. 


A corollary of Proposition 6.18 gives a first look at how differentiability 
interacts with convolution. 


Corollary 6.19. Suppose that f is a compactly supported function of class 
C” on R® and that g is in L?(R%, dx) with p equal to 1,2, or oo. Then f * g is 
of class C"”, and D(f * g) = (Df) * g for any iterated partial derivative of order 
<n. 


PROOF. First suppose that n = 1. Fix j with 1 < j < N,and put D; = 0/0x;. 
The function (Dj f) * g is continuous by Proposition 6.18. If we can prove that 
D,(f * g)(x) exists and equals ((D; f) * g)(x) for each x, then it will follow that 
Dj(f * g) is continuous. This fact for all j implies that f * g is of class C - 
by Theorem 3.7, and the result for n = 1 will have been proved. The result for 
higher n can then be obtained by iterating the result for n = 1. 

Thus we are to prove that D;(f * g)(x) exists and equals ((D; f) * g)(x) for 
each x. In the respective cases p = 1, 2, 00, put p’ = ov, 2, 1. Let e; be the Gh 
standard basis vector of R% and let h be real with |h| < 1. Proposition 6.15 gives 


A-'((f * g(x + hej) — (f * g)(x)) = (A (he, f — f)) * (x). 


Proposition 3.28a shows that h~! (Tne, f — f) converges uniformly, as h — 0, to 
Dj f onany compact set; since the support is compact, h~! (T_ne, f — f) converges 
uniformly to D; f on R%. Hence the convergence occurs in L®, and dominated 
convergence shows that it occurs in L! and L? also. Combining Proposition 6.18 
and (:«), we see that 


|n"(( fg) x +hej)—(feg)(x))—(Dj f)@)-< Whe F-F)—DjF lp ll8llp- 


The right side tends to 0 as h — 0, and thus indeed Dj(f * g)(x) exists and 
equals ((Dj f) * g)(x). 
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Twice in Chapter I we made use of an “approximate identity” in R!,a system of 
functions peaking at the origin such that convolution by these functions acts more 
and more like the identity operator on some class of functions. The first occasion 
of this kind was in Section I.9 in connection with the Weierstrass Approximation 
Theorem, where the functions in the system were g(x) = cn(1— x)” on[—1, 1] 
with the constants c, chosen to make the total integral be 1. The polynomials 9, 
had the properties 

(i) Qn(x) = 0, 
(i) fo, Gn) dx = 1, 

(iii) for any 5 > 0, sups<),)<) Pn(x) tends to 0 as n tends to infinity, 
and the convolutions were with continuous functions f such that f(0) = f(1) = 
0 and f vanishes outside [0,1]. The second occasion was in Section I.10 
in connection with Fejér’s Theorem, where the functions in the system were 
trigonometric polynomials Ky (x) such that 

Gi) Ky(x) 2 0, 
Gi) 37 J, Kw@)dx =1, 

(iii) for any 5 > 0, sups<),)<, K(x) tends to 0 as n tends to infinity. 

In this case the convolutions were with periodic functions of period 27 over an 
interval of length 277, and the integrations involved x dx instead of dx. 

Now we shall use the dilations of a single function in order to produce a more 
robust kind of approximate identity, this time on R%. One sense in which con- 
volution by this system acts more and more like the identity appears in Theorem 
6.20 below, and a sample application appears in Corollary 6.21. The corollary 
will illustrate how one can use an approximate identity to pass from conclusions 
about nice functions in some class to conclusions about all functions in the class. 


Theorem 6.20. Let g be in L'(R", dx), not necessarily > 0. Define 
g(x) = Ng(e!x) fore > 0, 


and put c = fen y(x) dx. Then the following hold: 
(a) if p =1lor p =2andif f is in L?(R, dx), then 


li pe * f —cfll, =9, 


(b) the conclusion in (a) is valid for p = oo if f is in L~(R%, dx) and 
lim;o It1 f — flo. = 9, 

(c) if f is bounded on R% and is continuous at x, then lim, yo(@e * f(x) = 
cf (x), 

(d) the convergence in (c) is uniform for any set E of x’s such that f is 
uniformly continuous at the points of E. 
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Corollary 6.21. If f in L©(R%, dx) satisfies lim,_,o ||t, f — filo. = 9, then 
f can be adjusted on a set of measure 0 so as to be uniformly continuous. 


Proor. Let y be a member of Coom(R™) such that San g(x)dx = 1. Fixa 
sequence {¢,} decreasing to 0 in R!. Proposition 6.18 shows that each g,, * f 
is bounded and uniformly continuous for every n, and Theorem 6.20 shows that 
{@e, * f} is Cauchy in L°. Since the L° and supremum norms coincide for 
continuous functions, {@~,, * f} is uniformly Cauchy and must therefore be uni- 
formly convergent. Let g be the limit function, which is necessarily bounded and 
uniformly continuous. Then || f — gll,, < WlLf —@e, * fllos tIl@e, * f — glloo, and 
both terms on the right tend to 0 as n tends to infinity. Consequently || f—g||,, = 0, 
and g is a bounded uniformly continuous function that differs from f only on a 
set of measure 0. 


3. Borel Measures on Open Sets 


A number of results in Sections 1-2 about Borel measures on IR extend to 
suitably defined Borel measures on arbitrary nonempty open subsets V of RY, 
and we shall collect some of these results here in order to do two things: to prepare 
for the proof in Section 5 of the change-of-variables formula for the Lebesgue 
integral in R% and to provide motivation for the treatment in Chapter XI of Borel 
measures on locally compact Hausdorff spaces. 

Throughout this section, let V be a nonempty open subset of R%. We shall 
make use of the following lemma that generalizes to V three properties (i-iii) 
listed for RY before Theorem 6.2 and Corollary 6.3. Let Coom(V) be the vector 
space of scalar-valued continuous functions on V of compact support in V. If 
nothing is said to the contrary, the scalars may be either real or complex. 


Lemma 6.22. 


(a) There exists a sequence {F,}°° , of compact subsets of V with union V 
such that F, C F?, | for all n. 

(b) For any compact subset K of V, there exists a decreasing sequence of open 
sets U, with compact closure in V such that (\-, U, = K. 

(c) For any compact subset K of V, there exists a decreasing sequence of 
functions in Ceom(V) with values in [0, 1] and with pointwise limit the indicator 
function of K. 


PROOF. In (a), the case V = R* was handled by (i) before Theorem 6.2. For 
V ~ RY", we can take F, = {x E V | D(x, V°) = 1/nand |x| < n} as long 
as n is > some suitable no. We complete the definition of the F,,’s by taking 
Fpses no-1 = Fry - 
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In (b), the case V = R™ was handled by (ii) before Theorem 6.2. For V 4 R, 
every x in K has D(x, V°) > 0 since V° is closed and is disjoint from K. 
The function D(-, V“°) is continuous and therefore has a positive minimum on 
K. Choose no such that D(x, V°) > 1/no for x in K, ie., |x — y| = 1/no 
for allx € K and y € V°. Then D(y, K) => 1/no if y is not in V. Let 
Un = {y € RY | Diy, K) < 1/n} for n > no. This is an open set containing 
K and its closure in R™ is contained in the set where D(y, K) < 1/n, which in 
turn is contained in V. The set where D(y, K) < 1/n is closed and bounded in 
IR and hence is compact. Therefore U‘' is contained in a compact subset of V. 
We complete the definition of the U,,’s by letting U;,..., Un, all equal U, 41. 

For (c), we argue as with (iii) before Corollary 6.3. Choose open sets U,, as in 
(b) that decrease and have intersection K , and apply Proposition 2.30e to obtain 
continuous functions f, : R” —> [0,1] such that f,, is 1 on K and is 0 on Ur. 
The support of the function f, is then contained in U‘', which is compact. By 
replacing the functions f, by g, = min{f),..., f,}, we may assume that they 
are pointwise decreasing. Then (c) follows. 


The Borel sets in the open set V are the sets in the o-algebra 
By(V)=ByOV ={ENV|E &€ By} 


of subsets of V. We can regard V as a metric space by restricting the distance 
function on R’, and because V is open, the open sets of V are the open sets of 
IR that are subsets of V. We shall prove the following proposition about these 
Borel sets after first proving a lemma. 


Proposition 6.23. The o-algebra By(V) is the smallest o-algebra for V 
containing the open sets of V, and it is the smallest o-algebra for V containing 
the compact sets of V. 


Lemma 6.24. Let X be a nonempty set, let 7/ be a family of subsets of X, let 
B be the smallest o-ring of subsets of X containing U/, and let E be a member of 
B. Then BN E is the smallest o-ring containing UN E. 


PROOF OF LEMMA 6.24. Let A be the smallest o-ring containing UN E, and let 
A’ be the smallest o-ring containing UNE‘. Since BNE is ao-ring of subsets of X 
containing UNE, Ais containedin BNE. Similarly A’ C BN E*. Thus the set of 
unions AU A’ with A € Aand A’ € A’ is contained in B, contains U/, and is closed 
under countable unions. To see that it is closed under differences, let Ay U A) and 
Az U A} be such unions. Then (A; U Aj) — (A2 U A}) = (At — Az) U (Aj — AS) 
exhibits the difference of the given sets as such a union. Hence the set of such 
unions is a o-ring and must equal B. 
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PROOF OF PROPOSITION 6.23. The statement about open sets follows from 
Lemma 6.24 by taking X to be R” ,U/ to be the set of open sets in R", and E to 
be V. The set //M E is the set of open subsets of V, and the lemma says that the 
smallest o-ring containing UM E is By(V). This is a o-algebra of subsets of V 
since V itself isin UN V. 

Let {F,,} be the sequence of compact subsets of V produced by Lemma 6.22a. 
Since V = J“, Fy, V isa member of the smallest o-ring of subsets containing 
the compact subsets of V. If F is a relatively closed subset of V, then each FN F,, 
is compact, and the countable union F is therefore in this o-algebra. Taking 
complements, we see that every open subset of V is in the smallest o-algebra of 
subsets of V containing the compact sets. Therefore By (V) is contained in this 
o-algebra and must equal this o-algebra. 


A Borel function on V is a scalar-valued function measurable with respect to 
By(V). A Borel measure on V is a measure on By (V) that is finite on every 
compact set in V. 


Theorem 6.25. Every Borel measure jz on the nonempty open subset V of 
RY is regular in the sense that the value of on any Borel set E in V is given by 


W(E)= sup p(K)= inf pV). 
KCE, UDE, 
K compact in V U open in V 


REMARK. If jz is a Borel measure on V and if we define v(Z) = w(E NV) for 
Borel sets E of R”,, then v is a measure on the Borel sets of R , but v need not 
be finite on compact sets. Thus Theorem 6.25 is not a special case of Theorem 
6.2. 


PROOF. This is proved from parts (a) and (b) of Lemma 6.22 in exactly the 
same way that Theorem 6.2 is proved from items (i) and (ii) before the statement 
of that theorem. 


Corollary 6.26. If j. and v are Borel measures on V such that he fdu= 
Jy f av for all f in Coom(V), then w = v. 


PROOF. This is proved from Theorem 6.25 and Lemma 6.22c in the same way 
that Corollary 6.3 is proved from Theorem 6.2 and item (iii) before the statement 
of that corollary. 


Corollary 6.27. Let p = 1 or p = 2. If wis a Borel measure on V, then 
(a) Ccoom(V) is dense in L?(V, 2), 
(b) the smallest closed subspace of L?(V, jz) containing all indicator func- 
tions of compact subsets of V is L?(V, jz) itself, 
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(c) Coom(V), as anormed linear space under the supremum norm, is separa- 
ble, 
(d) L?(V, j2) is separable. 


PROOF. Conclusions (a) and (b) are proved from Lemma 6.22c with the aid of 
Propositions 5.56 and 5.55d in the same way that Corollary 6.4 is proved from 
item (iii) before Corollary 6.3 with the aid of those propositions. 

For (c), Lemma 6.22a produces a sequence {F,,}°° , of compact subsets of V 
with union V such that F, C F? 1 for all n. Since the open sets F? cover V, they 
cover any compact subset K of V,and K must be contained in some F’. Let us 
put that observation aside for a moment. For each n, we can identify the vector 
subspace of Coom(V) of functions supported in F,? with a vector subspace of 
C(F,). The latter is separable by Corollary 2.59, and hence the vector subspace 
of Ccom(V) of functions supported in F? is separable. If we form the union on 
n of these countable dense sets of certain vector subspaces and if we take into 
account that the functions supported in any compact subset of V have compact 
support within some F’’, we see that Ccoom(V) is separable. 

For (d), we apply (a) and (c) with V replaced by F? and take into account 
that 4(F,) < oo. Let f be arbitrary in L?(F)’, | ,.). Ife > 0 is given, choose 
g in Coom(F,7) with || f — gllp < €. Then choose h in the countable dense set 
Dy of Coom(Fy) such that lg _ All sup sé. Since If _ Alp < If _ 8\lp + 
llg — Allp and |lg —hllp = Jiro lg) = AC)? du(x) < €? (Fn), we obtain 
lf —hllp S€+ €(F,)!/?. Hence the closure in L?(V, 2) of the countable set 
D =\(J%, Dy, contains f. In particular, D“ is a vector subspace containing all 
indicator functions of compact subsets of V. By (b), D‘! = L?(V, 2). 


Proposition 6.28. With V still open in RY , let V’ be an open set in R’. Under 
a continuous function or even a Borel measurable function F : V > V’, F~!(E) 
is in By(V) whenever E is in By’ (V’). 


PROOF. The set of E’s for which F~!(E) is in By(V) is a o-algebra, and this 
o-algebra contains the open geometric rectangles of R™’ by the same argument 
as for Proposition 6.8. Thus it contains By: (V’). 


Corollary 6.29. If V’ is a second nonempty open subset in R™ besides V, 
then any homeomorphism of V onto V’ carries By (V) to By (V'). 


If K is anonempty compact subset of RY, it will be convenient to be able to 
speak of the Borel sets in K, just as we can speak of the Borel sets in an open 
subset V of R". The theory for K is easier than the theory for V, partly because 
Borel measures on K can all be obtained by restriction from Borel measures on 
RY, 
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The Borel sets in K are the sets in By(K) = By K. Using Lemma 6.24, we 
readily see that by(K) is the smallest o-algebra for K containing the compact 
subsets of K ; the argument is simpler than the corresponding proof for Proposition 
6.23 in that it is not necessary to produce some sequence of sets by means of 
Lemma 6.22. 

A Borel function on the compact set K is a scalar-valued function measurable 
with respect to By(K). A Borel measure on K is a measure on By (K) that is 
finite on compact subsets of K. In this situation regularity is a consequence of 
the regularity of Borel measures on R”, and no separate argument is needed. In 
fact, if 4 is a Borel measure on K, we can define v(E) = uw(E OM K) for each 
Borel subset E on RY, and then the finiteness of j4(K) implies that v is a Borel 
measure on R™. Borel measures on R are regular, and therefore we have 


V(E) = sup vK)= inf vU). 
RICE, URE, 


K' compact in RY U open in RN 


Replacing E by EM K and substituting from the definition of v, we obtain the 
following proposition. 


Proposition 6.30. Every Borel measure jz on a compact nonempty set K in 
IR is regular in the sense that the value of jz on any Borel set E in K is given by 


wW(E)= sup w(K’)= inf = u(U). 
KCE, U2DE, 
K! compact in K U relatively 
open in K 


4. Comparison of Riemann and Lebesgue Integrals 


This section contains the definitive theorem about the relationship between the 
Riemann integral and the Lebesgue integral in RY. The Riemann integral is 
defined in Section III.7, the Lebesgue integral is defined in Section V.3, and 
Lebesgue measure in RY is defined in Section 1 of the present chapter. In order 
to have a notational distinction between the Riemann and Lebesgue integrals, 
we write in this section R 4 J 4x for the Riemann integral of a bounded real- 
valued function on a compact geometric rectangle A, and we write {, f dx for 
the Lebesgue integral. 


Theorem 6.31. Suppose that f is a bounded real-valued function on a compact 
geometric rectangle A in R”. Then f is Riemann integrable on A if and only if 
f is continuous except on a Lebesgue measurable set of Lebesgue measure 0. In 
this case, f is Lebesgue measurable, and Rf, f dx = f, f dx. 
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PROOF. Proposition 6.12 shows that a set in R™ has “measure 0” in the sense 
of Chapter III if and only if it is Lebesgue measurable of measure 0, and Theorem 
3.29 shows that f is Riemann integrable on A if and only if f is continuous 
except on a set of measure 0. This proves the first conclusion of the theorem. 

For the second conclusion, suppose that Rf, f dx exists. Lemma 3.23 shows 
that there exists a sequence of partitions P“) of A, each refining the previous one, 
such that the lower Riemann sums L(P, f) increase to R sp , J @x and the upper 
Riemann sums U(P“?, f) decrease to R ie f dx. For each k, we define Borel 
functions L; and U; on A as follows: If x is an interior point of some component 
(closed) rectangle S of P“’, we define Ly (x) = ms(f), where ms(f) is the 
infimum of f on S; otherwise we let Lx, (x) = 0. If we write || for the volume of 
S, then the Lebesgue integral of L; over S is given by he Ly (x) dx =ms(f)|S|- 
Consequently 


if LgG@)dz = > ms S\= Le, fr: 
A Ss 


We define U;,(x) similarly, using the supremum Ms5(f) of f on S instead of the 
infimum, and then 


[vo = D5 Ms(f)|S| = UP®, f). 
S 


Let E be the subset of points x in A such that x is in the interior of a component 
rectangle of P for all k. The set A — E is a Borel set of Lebesgue measure 0. 
Since P“t is a refinement of P“ for every k, we have L;(x) < Ly41(x) and 
U(x) > Uxs1(x) for all x in E and all k. Therefore L(x) = lim L(x) and 
U(x) = lim U; (x) exist for x in FE. Since Ly(x) < f(x) < Uxg(x) for x in E, we 
see that 

L(x) < f(x) < U(@) for all x in E. 


Define L(x) = U(x) =00n A-— E. Then L and U are Borel functions with 
L(x) < U(x) everywhere on A. On EF, we have dominated convergence, and 
thus 


[ tooas tim [ L(x) dx and [ vooas = iim | U,(x) dx. 
E k JE E k JE 


The set A — E has Lebesgue measure 0, and therefore these equations imply that 


[ teoar = tim | Li (x) dx and [ veoas = iim | U; (x) dx. 
A k JA A k JA 
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Consequently 
[ eooas = tim | L(x) dx = lim L(P™, f) =R f f dx 
A k JA k A 
=limucP®, f) =tim f U(x) dx -| U(x) dx. 


Since L(x) < U(x) on A, Corollary 5.23 shows that the set F where L(x) = U(x) 
is a Borel set such that A — F has Lebesgue measure 0. Since the inequalities 
L(x) < f(x) < U(x) are valid for x in E, f(x) equals L(x) at leaston EN F. 
The set EM F isa Borel set, and L is Borel measurable; hence the restriction of f 
to EO F is Borel measurable. The set A— (EMF) is a Borel set of measure 0, and 
the restriction of f to this set is Lebesgue measurable no matter what values f 
assumes on this set. Thus f is Lebesgue measurable. Then the Lebesgue integral 
J, f dx is defined, and we have 


[ tears | fsydx = [ Leds =f Loydr=R f fas. 
A ENF ENF A A 


5. Change of Variables for the Lebesgue Integral 


A general-looking change-of-variables formula for the Riemann integral was 
proved in Section III.10. On closer examination of the theorem, we found that 
the result did not fully handle even as ostensibly simple a case as the change from 
Cartesian coordinates in R* to polar coordinates. Lebesgue integration gives us 
methods that deal with all the unpleasantness that was concealed by the earlier 
formula. 


Theorem 6.32 (change-of-variables formula). Let g be a one-one function of 
class C! from an open subset U of R% onto an open subset g(U) of R® such that 
det gy’ (x) is nowhere 0. Then 


/ fay = [ Ff (p(x))| det g'(x)| dx 
gU) U 


for every nonnegative Borel function f defined on g(U). 


REMARK. The o-algebra on g(U) is understood to be By M g(U), the set 
of intersections of Borel sets in R% with the open set y(U). If f is extended 
from g(U) to R™ by defining it to be 0 off g(U), then measurability of f with 
respect to this o-algebra is the same as measurability of the extended function 
with respect to By. 
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PROOF. Theorem 3.34 gives us the change-of-variables formula, as an equality 
of Riemann integrals, for every f in Coom(Y(U)). In this case the integrands on 
both sides, when extended to be 0 outside the regions of integration, are continuous 
on all of RY , and the integrations can be viewed as involving continuous functions 
on compact geometric rectangles. Proposition 6.11 (or Theorem 6.31 if one 
prefers) allows us to reinterpret the equality as an equality of Lebesgue integrals. 

In the extension of this identity to all nonnegative Borel functions, measur- 
ability will not be an issue. The function f is to be measurable with respect 
to By(g(U)), and Corollary 6.29 shows that such f’s correspond exactly to 
functions f o g measurable with respect to By (U). 

Using Theorem 5.19, define a measure jz on By (U) by 


u(E) = | det y'(x)| dx. 
E 


Corollary 5.28 implies that jz satisfies 


[eau = [eco lacre sas (*) 
U U 


for every nonnegative g on U measurable with respect to By(U). Next define 
another set function v on By (U) by 


v(E) =m(y(E)), 


where m is Lebesgue measure. It is immediate that v is a measure, and we have 
Soy In(g'(y)) dy = Sow) Ig(z) dy = m(y(E)) = v(E) = fy Ie dv. Passing 
to simple functions > 0 and then using monotone convergence, we obtain 


i sop tdy= [ gdv (4) 
gU) U 


for every nonnegative g on U measurable with respect to By (U). 
If in («*) and (*) we take g = f og with f in Coom(@(U)) and we substitute 
into the change-of-variables formula as itis given for f in Coom(@(U)), we obtain 


the identity 
i gdv= / gdu (7) 
U U 


for all g in Coom(U). From Corollary 6.26 we conclude that ~ = v. Hence 
(+) holds for every nonnegative g on U measurable with respect to By(U). We 
unwind (+) using («*) and («) with g = f o @ but now taking f to be any 
nonnegative function on g(U) measurable with respect to By(g(U)), and we 
obtain the conclusion of the theorem. 
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Let us return to the example of polar coordinates in R?, first considered in 
Section III.10. The data in the theorem are 


U={(,) | 0<r <+00 and 0 <@ <2z}, 
(5) =9(3) = (7202). 


x 


gU)=R’—{(q) |* 20}. 


Since det g’ tat) =r, Theorem 6.32 gives 


and we have 


if fle. yydedy = | f(r cos6,rsin@)r dr dé 
gU) O<r<oo, 0<6<27 


for every nonnegative Borel function f on g(U). The set of integration g(U) on 
the left side is not quite the whole plane; it omits the part of the x axis where 
x >. But this is a harmless defect: this subset of the x axis is contained in the 
entire x axis, which is an abstract rectangle in the sense of Fubini’s Theorem and 
has measure 0. Thus the formula can be changed to read 


/ fs. yydedy = | f(r cos6,rsin@)r dr dé 
R O<r <oo, 0<0<2n 


for every nonnegative Borel function f on R*. Here is an application of this 
formula that we shall use in proving the Fourier Inversion Formula in Chapter VIII. 


CO 


Proposition 6.33. i. eo dx =1. 


—C 
REMARK. Since we now know from Theorem 6.31 that there is no discrepancy 
between the Riemann integral and the Lebesgue integral with respect to Lebesgue 
measure, there will be no harm in the future in writing limits of integration in the 
usual way for integrals with respect to 1-dimensional Lebesgue measure. 


PROOF. We use polar coordinates and Fubini’s Theorem to compute the square 
of the integral in question: 
2 
(he ene? dx) = = eT EY? omy? dx dy = he en Be +y") dx dy 
he pe er do dr =2n te. re” dr 


= 27 limy ie re" dr = limy [ — ra = 1. 


Since the integral in question is certainly > 0, the proposition follows. 
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Proposition 6.33 is closely related to properties of the “gamma function,” a 
certain function of a complex variable that reduces essentially to the factorial 
function on the positive integers. The definition of the gamma function makes 
use of the expression f° defined for 0 < t < +oo ands in C by tr = e° 8, 

Fix s € C with Res > 0. The function t + 1f°~!e~ is continuous on 
(0, +-0o) and hence Borel measurable. Let us see that it is integrable with respect 
to Lebesgue measure. Since |t°~'e~‘| = r®°’-!e-", we may assume that s is real 
(and positive) in showing the integrability. Integrability on (0, 1] is no problem, 
since we know that hs t’-!dt < co for s > 0. To handle [1, +00), let n be 


an integer > s — 1. Then ¢-! < ¢” = 2"n!(4(5)") < Bint ye (eye = 
2"nte'/?, Hence t’~!e~! < 2"nle~/?, and the integrability on [1, +00) follows. 


With this integrability in place, we define the gamma function by 
[o.@) 
T(s) = / tle dt for Res > 0. 
0 


Proposition 6.34. The gamma function has the properties that 
(a) '(s +1) =sI(s) forRes > 0, 

(b) TC) = land l(n+ 1) =n! for integers n > 0, 

(c) T(5) = Va. 


PROOF. Part (a) follows from integration by parts, which needs to be done on 
an interval [e, M] and followed by passages to the limit ¢ — 0 and M — oo. In 
(b), the formula (1) = 1 just amounts to the elementary integral i et dt=1, 
and then the formula (nm + 1) = n! for integers n > 0 follows by iterating (a). 
For (c), the change of variables t = 2.x gives 


(4) Bal ie er ae [oo x2) Me™ Ox d= fe fe en™ dx, 


Since [>~ as dea hf e~™* dx, Proposition 6.33 allows us to conclude 
that 2f,e** dx = 1. Hence (5) =./m. 


It is often true in applications of the change-of-variables formula that the set 
y(U) does not exhaust the set that one might hope to have as region of integration. 
For polar coordinates the exceptional set was the part of the x axis with x > 0, 
and an easy argument showed that the exceptional set had measure 0. In a more 
complicated example, that easy argument will not ordinarily apply, but still the 
exceptional set has a certain “lower-dimensional” quality to it. A general result 
saying that certain lower-dimensional sets have measure 0 will be given as a 
corollary of Sard’s Theorem, which we prove now. 

Let wy : V > R® be a smooth map defined on an open subset V of RY. A 
critical point x of w is a point where w’(x) has rank < N. In this case, w(x) is 
called a critical value. 
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Theorem 6.35 (Sard’s Theorem). If ~ : V — RY is a smooth map defined 
on an open subset V of R™, then the set of critical values of y is a Borel set of 
Lebesgue measure 0 in RY. 


PROOF. The set where y’(x) has rank < N — 1 is relatively closed in V and 
hence is the union of countably many compact sets. The set of critical values is 
then the union of the compact images of these sets and consequently is a Borel set. 
Let us see that this Borel set has Lebesgue measure 0. Since V is the countable 
union of compact geometric rectangles and since the countable union of sets of 
measure 0 is of measure 0, it is enough to prove the theorem for the restriction of 
w to acompact geometric rectangle R. 

For points x = (x,,...,xy) and x’ = (x},...,x)) in R, the Mean Value 
Theorem gives 


OWi 
Wile’) — Wir) = ae ia shh (*) 
jae 
where z; is a point on the line segment from x to x’. Since the oa are bounded 
on R, we see as a consequence that 
Ix’) — WO) Sax’ — x| (4) 


with a independent of x and x’. Let L,(x’) = (Ly.1(@"),..., Lx,w(")) be the 
best first-order approximation to w about x, namely 


Lyle’) = Wile) 4 ae ae (aya) — 3). 


j=l 


Subtracting this equation from (+), we obtain 


OW; OW; 
Wi’) — Lr iG!) = s (= Ms Ag it 09) @i= az): 
j=l J 


a 


Since =: is smooth and |z; — x| < |x’ — x|, we deduce that 


IW (x') — Ly (x')| < dix’ — x/? (+) 


with b independent of x and x’. 

If x is a critical point, let us bound the image of the set of x’ with |x’ —x| <c. 
The determinant of the linear part of L, is 0, and hence L, has image in a 
hyperplane, not necessarily a coordinate hyperplane. By (+), w(x’) has distance 
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< bc? from this hyperplane. In each of the N — 1 perpendicular directions, 
(**) shows that y(x’) and y(x) are at distance < ac from each other. Thus 
w(x’) is contained in a box! centered at w(x) with volume 2% (ac)% —l(be?) = 
QN @N-IpeNF1, 

We subdivide R into M smaller compact geometric rectangles whose dimen- 
sions are 1/M times those of R. If d is the diameter of R and if one of these 
smaller geometric rectangles R’ contains a critical point x, then any point x’ in R’ 
has |x’—x| < d/M. By the result of the previous paragraph, of R’ is contained 
in a box of volume 2% aN~!b(d/M)N*!. The union of these boxes, taken over all 
of the smaller geometric rectangles containing critical points, contains the critical 
values. Since there are at most M of the smaller geometric rectangles, the outer 
measure m* of the set of critical values, where m refers to Lebesgue measure, is 
< 2NaN-!bdN*+!M—!. This estimate is valid for all M, and hence the set S of 
critical values has m*(S) = 0. Therefore the Borel set S has Lebesgue measure 0. 


Corollary 6.36. If y : V — RY is a smooth map defined on an open subset 
V of R” with M < N, then the image of y is a Borel set of Lebesgue measure 0 
in R’. 

PROOF. Sard’s Theorem (Theorem 6.35) applies to the composition of the 
projection RY — R” followed by yw. Every point of the domain is a critical 
point, and hence every point of the image is a critical value. The result follows. 


We define a lower-dimensional set in R” to be any set contained in the 
countable union of smooth images of open sets in Euclidean spaces of dimension 
< N. The following result is immediate from Corollary 6.36. 


Corollary 6.37. Any lower-dimensional set in RY is Lebesgue measurable of 
Lebesgue measure 0. 


The N-dimensional generalization of polar coordinates in R? is spherical 
coordinates in R” . In the notation of Theorem 6.32, we have 


‘ 
61 0<r<+o, 
U= : 0<6,<aforl<j<N-—2, 
j : 0 < Oy_) < 2m 
N-1 
- r cos 0; 
X1 6 r sin 0; cos 02 
1 
and ( : = — : 
XN 6 : r sin 6,-+- sin @y_2 cos Oy_1 
N-1 


r sin 6, --- sin O@y_2 sinOny_1 


'This box need not have its faces parallel to the coordinate hyperplanes. 
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Problem 2 at the end of the chapter asks for three things to be checked: 
(i) the determinant factor in the change-of-variables formula is given by 


| det g’| = rN! sinY~? 6, sin’? @ --- sin @y_>, 
(ii) g is one-one on U, 
(iii) the complement of g(U) in R™ is a lower-dimensional set. 


Then it follows that the change-of-variables formula applies and that the integra- 
tion over y(U) can be extended over R% . We can write the result as 


oo IT 2 
/ faydx = [ i af if f(rcos6,,...) 
RY r=0 6,=0 On—-2=0 Ony—1=0 


x rX—! sinN-2 9, .-- sin@y_> dOy_1-++d6 dr. 


The expression sin’ ~? 6, --- sin @y_2 dOy_1 ---d0, we abbreviate as dw. Geo- 
metrically it is the contribution to Lebesgue measure on R™ from the sphere S%~! 
of radius 1 centered at the origin. In Chapter XI we shall speak of Borel sets in 
any compact metric space. The sphere S“~! is a compact metric space, and we 
shall note that dw refers to a rotation-invariant Borel measure on S‘~!. We write 


Qy-1 = i dw 
SN-1 


for the “area” of the sphere S‘~!. This constant is evaluated in Problem 12 at the 
end of the present chapter with the aid of Proposition 6.33. In terms of daw, the 
change-of-variables formula for spherical coordinates is 


/ fsydx = [ / f(ra)r®—! dwar. 
RY r=0 JweSN-! 


This formula allows us quickly to check the integrability of powers of |x| near 
the origin and near oo. In fact, we have 


1 
/ |x|?dx = ay f fen l gy 
|x|<1 0 


(oe) 
and / |x|4dx = av f per dk 
|x|21 1 


from which we see that 


ahs forg > —N, 
|x|? is integrable near 
oo forg < —N. 
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6. Hardy-—Littlewood Maximal Theorem 


This section takes a first look at the theory of almost-everywhere convergence. 
The theory developed historically out of Lebesgue’s work on an extension of the 
Fundamental Theorem of Calculus to general integrable functions on intervals of 
the line, work that we address largely in the next chapter. We shall see gradually 
that the theory applies to a broader range of problems than the ones immediately 
generalizing Lebesgue’s work, and one can make a case that nowadays the theory 
in this section is of considerably greater significance in real analysis than one 
might expect from Lebesgue’s work on the Fundamental Theorem. 

The theory brings together two threads. The first thread is the observation that 
an effort to differentiate integrals of general integrable functions on an interval 
of the line can be reinterpreted as a problem of almost-everywhere convergence 
in connection with an approximate identity of the kind in Theorem 6.20. In 
explaining this assertion, let us denote Lebesgue measure by m as necessary. 
To differentiate F(x) = shes f(t) dt, one forms the usual difference quotient 
h-|[F(x +h) — F(x)], which can be written for h > 0 as 


1 
aN —y)dy= - —h, 0))7'K- d 
m({—A, 0]) Tas f(x y) y [ f(x y)m([ 1) [ n,ol(y) y 


oras f *@,(x), where g(y) = m({[—1, 0))~! 7 -1,0)09). Here ¢ has integral 1, and 
(gn is the normalized dilated function defined in Section 2 by y;,(y) = h~!y(h7'y) 
in the 1-dimensional case. Theorem 6.20 says for p = | and p = 2 that as h 
decreases to 0, f * gy, converges to f in L? if f isin L?. Also, f * yg, converges 
uniformly to f if f is bounded and uniformly continuous, and f *@; (x) converges 
to f(x) at the point x if f is bounded and is continuous at x. The problem about 
differentiation of integrals asks about convergence almost everywhere. 

We shall want to have a theorem in RY , and for this purpose an N-dimensional 
version of J;_1,9] does not seem attractive for generalizing. Instead, let us general- 
ize from J{_1,1], taking the N-dimensional problem to involve a ball B of radius 1 
centered at the origin; there is some flexibility in choosing the set B, and a cube 
centered at the origin would work as well. We write r B for the set of dilates of 
the members of B by the scalar r. Thus we investigate 


morB)~ f f(x —y)dy 
rB 
as r decreases to 0; equivalently we investigate 


f * Q(x), where p(y) = m(B)'Ip(y). 
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The second thread comes from making a simple observation and then trying 
to prove the converse in specific settings, as improbable as it sounds. The 
observation is that if some sequence of nonnegative functions indexed by n 
is to converge almost everywhere, its supremum on n must be finite almost 
everywhere. A converse would say that a finite supremum almost everywhere 
implies convergence almost everywhere. Banach succeeded in proving an abstract 
such converse in a 1926 paper, making use of the completeness of the space of 
functions he was studying. In a celebrated 1930 paper, Hardy and Littlewood 
proved a concrete such converse in connection with differentiation of integrals; 
they obtained a quantitative estimate about the supremum, and then the almost 
everywhere convergence followed from that estimate and from the fact that the 
convergence certainly takes place for nice functions. Here is an N-dimensional 
version of the basic theorem in that direction. 


Theorem 6.38 (Hardy—Littlewood Maximal Theorem). If f is in L'(R%), 
then 
SM flla 


m{x | f*(x) >é} < : 


for every € > 0, where 


fr(x) = sup morB)-" fife »)ldy. 


0<r <oo 


Before examining the statement of the theorem more closely and then proving 
the theorem, let us see how to derive a corresponding N-dimensional convergence 
result from it, and let us see how the first part of Lebesgue’s version of the 
Fundamental Theorem of Calculus, the part about differentiation of integrals, 
follows as well. 


Corollary 6.39. If f is integrable on every bounded subset of RY , then 


lim mor) f f(x—y)dy = f(x) ae. 
r{0 rB 


PROOF. Since the convergence for a particular x depends on the behavior of 
the function only near x, we may assume that f is identically 0 off some bounded 
set. The effect of this assumption for our purposes is that f then has to be in 
L' (RY). Define 


T.(f) = m(rB)" i $= yw. 


bearing in mind that f*(x) = sup,., T-(|f|)@). If g is continuous of compact 
support, then lim, )o T- g(x) = g(x) everywhere by Theorem 6.20c. Let € > 0 be 
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given, and choose by Corollary 6.4 a function g in Cgom(R” ) with || f — gll1 < €. 
Then 


lim sup |T, f(x) — f (x)| 
r{0O 


= asus dan sae es g(x)| + g(x) — f(x) 
< up IT-(f — g)@)| + le @) — f@)| 
x sup LS =20@) Fle) a7 BI: 


If the left side is > &, at least one of the terms on the right side is > €/2. Hence 
{x | lim sup |T; f (x) — f(x)| > é} 
S {x | Ff — 8)*@) > &/2} U {x | 1f@) — g@) > €/2}. 


By Theorem 6.38 and the inequality a m{x | |F(x)| > a} < ||F'll1, the Lebesgue 
measure of the right side is 


2 2 Sf = ahh | 2ilf — gl ee lor). 
7 g § 7 § 


Since € is arbitrary, S(é) = ie: | lim sup |T, f (x) — f(x)| > é} has measure 0. 
Letting € tend to 0 through the values 1/n, we see that S(O) has measure 0, i.e., 
that lim, jo 7, f(x) = f(x) almost everywhere. 


Corollary 6.40 (first part of Lebesgue’s form of the Fundamental Theorem of 
Calculus). If f is integrable on every bounded subset of R!, then he tO) dy is 
differentiable almost everywhere and 


d x 
< [| foray= fe) ae. 
x a 
PRooF. For f in L'(R'), let f* be as in Theorem 6.38, and define 
i 1p? 
fc) =sup 7 [f+ n]dr and fix) = sup > | |f@ + nde. 
aso A Jo n>o A J—nv 


Then F 
1 
fe" (x) < sup “ff, far y lat =2 f°), 


h>0 - 
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and similarly f;**(x) < 2 f*(x). From Theorem 6.38 it follows that 


m{x | f/*@) > €} < 10 fl /é 
and m{x | f/x) > €} < 10 f 11 /é. 


The same argument as for Corollary 6.39 allows us to conclude, for any f 
integrable on every bounded subset of R!, that lim; jo ; ie fa+t)dt = f(x) 


a.e. and limp jo en f(x +t)dt = f(x) ae. Hence ff f(@t)dt = f(x) 
almost everywhere for such f. 


Let us return to Theorem 6.38. The function f*(x) is called the Hardy-— 
Littlewood maximal function of f. It is measurable because the supremum 
over rational r gives the same value of f*(x) for each x. If we let € tend to oo in 
the inequality m{x | fi@~> é} < 5‘ f| i/é: we see immediately that f*(x) 
is finite almost everywhere, i.e., that the supremum in question is actually finite 
almost everywhere. The inequality is a quantitative version of that qualitative 
conclusion. 

For any situation in which it is desired to prove an almost-everywhere conver- 
gence theorem, there is an associated maximal function in modern terminology, 
which can be taken as the supremum of the absolute value of the quantity for 
which one is trying to prove almost-everywhere convergence. In the above case 
we used the supremum for | f| instead, which in principle could be larger. 

There is no hope that the Hardy—Littlewood maximal function f* is actually in 
L! if f is not a.e. the 0 function because the occurrence of large values of r in the 
supremum already rules out L! behavior: in fact, f*(x) is necessarily > a positive 
multiple of |x|~ for large |x|, and thus f* cannot be integrable. On the other 
hand, f* is close to integrable: We shall see in Section 10 that the integral of any 
nonnegative function g can be computed in terms of the function m {x | g(x) >é } 
of &, the formula being fay g(x) dx = fie m{x | g(x) > &} dé. Theorem 6.38 
shows that the integrand in the case of f* is < a multiple of 1/&, and 1/€ is 
close to being integrable on (0, +00). This is a better qualitative conclusion than 
merely finiteness almost everywhere, and Theorem 6.38 is a quantitative version 
of just how close f* is to being integrable. 

The particular property of f* that is isolated in Theorem 6.38 arises fairly 
often. If g > 0 is integrable and S is the set where g > &, then g > éIs 
everywhere; hence ||g||; => § m(S) and m(S) < igi /é. A function g is said to 
be in weak L! if 


m{x| |g) >é} < C/E 


for some constant C and for all € > 0. Theorem 6.38 says that the nonlinear 
operator f +> f* carries L! to weak L! with C bounded by a multiple of the L! 
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norm of f, and an operator of this kind that satisfies also a certain sublinearity 
property is said to be of weak type (1, 1). We return to this matter, with the 
definition in a clearer context, in Chapter IX. 

Now let us prove Theorem 6.38. One modern proof uses the following covering 
lemma, which takes into account the geometry of R ina surprisingly subtle way. 
Once the lemma is in hand, the rest is easy. 


Lemma 6.41 (Wiener’s Covering Lemma). Let E C R% be a Borel set, and 
suppose that to each x in E there is associated some ball B(r; x) with r perhaps 
depending on x. If the radii r = r(x) are bounded, then there is a finite or 
countable disjoint collection of these balls, say B(r1; x1), B(r2; x2),..., such 
that either the collection is infinite and inf) <j<o0rj #0 or 


oe) 
E C| J BGr;; x)). 
j=l 


In either case, 
CO 


m(E) < 5% SY m(BO;: x;)). 


j=l 


REMARK. The shape of the sets of B(r; x) is not very important. What is 
important is that there be some neighborhood B of the origin that is closed under 
the operation of multiplying all its members by —1 and by any positive number 
r <1. The other sets are obtained from B by dilation and translation. 


PROOF. Let 
A, = {all sets B(r; x) in question} 
and Ri = sup {r | Bir; x) isin A; for some x}. 
By hypothesis, Rj is finite. Pick some B(r1; x1) withr; > 5R , and let 
Az = {members of A, disjoint from B(r;; x1)}. 


If A> is empty, let all further R;’s be 0 and let all further B(r;; x;)’s be empty. If 
Ay is nonempty, let 


R> = sup {r | B(r; x) is in A» for some x}. 
Pick B(r2; x) in Ap with r2 > 5R>. Let 


Aj = {members of Az disjoint from B(r2; x2)}, 
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and proceed inductively to construct R3, B(r3; x3), Ag, etc. 

The numbers R; are monotone decreasing. We may assume that lim R; = 0, 
since otherwise inf; rj 4 0 and )) m(B(r;; x;)) = +00. Let 

V; = union of all sets in. Aj — Aj41 for j = 1 

and Vo = union of all sets in A). 
Then Vo = Ue , Vj; in fact, if B(r; x) is in Aj, then the equality lim R; = 0 
forces there to be a last index j such that B(r; x) is in A;, and this j has the 
property that B(r; x) is in A; and not A;+1. 

Since E © U,eg Bir; x) = Vo = Uj, Vj. the proof will be complete if we 
show that 

Vj S BOr;; xj). (*) 


Thus let B(r; x) be in Aj — Aj41. Thenr < R;, 
BY, xX) ON BY;; xj) FS, 
and rj; > 5 Rj. Consequently r < 2r; and 
Bir; x—xj)O B(rj;0) 4S. 


This condition means that there is some p in B(r;; 0) with |x — x; — p| <r. If 
q is any member of B(r; x), then 


lq—xjlSlq—x] 4 le —xj— plt ipl <r tr tr = er +rj. 
Thus g is in B(2r-+rj;; x;) © B(5r;; x;), and (*) follows. 


PROOF OF THEOREM 6.38. Let E = {x| f*(x) > &}. If x is in E, then 
m(B(r;0))7! Tress |f@)|dy > &€ for some r > 0. Associate this r to x in 
applying Lemma 6.41. Since 


& <m(B(r; of If dy <r-%m(Bd; 0)) "INF lh. 


B(r;x) 


we see that rN’ < &-!m(B(1;0))7!|| f],.. Hence the numbers r are bounded. 
Thus the lemma applies, and we obtain 


m(E) <5" Y° m(B(rj; x))) < SNE! 3 IfOldy <S%ENIF lh. 
Jj J 


B(rj3x;) 


the last inequality holding because of the disjointness of the sets B(r;; x;). 
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Let us return to the theme of almost-everywhere convergence in connection 
with approximate identities. Theorem 6.38 has the following consequence of just 
that kind. 


Corollary 6.42. Let g > 0 be a continuous integrable function on R% of the 
form g(x) = ¢go(|x|), where go is a decreasing C ' function on [0, oo), and define 
g(x) = e “ey(e7!x) for ¢ > 0. Then there is a constant Cg such that 


sup (Ge * fI(X)| S Cyt @) 


for all x in RN and for all f in L'(R”). Consequently if fox (x) dx = 1, then 
lini (ge. * f)(x) = f(x) 


almost everywhere for each f in L'(R). 


PROOF. Put w(r) = —g{(r) = 0, so that go(r) — go(R) = ee wis) ds. The 
integrability of gy and the fact that gp is decreasing force limp_,oo Yo(R) = O, and 
we obtain go(r) = i. w(s)ds and g(x) = fe wi(r) dr. Meanwhile, the inte- 
grability of g, together with the formula for integrating in spherical coordinates, 
shows that ie go(r)rN-'dr = C < +00. Integrating by parts on the interval 
[0, M] gives 


Ce fe gor’ —!dr = d[go@)r® |p +4 fo" vor ar, 


and thus 


xo W@)r' dr <C < +00. 
The form of ¢ implies that 
fe (X) aa ete wr) dr. 


If, as in the statement of Theorem 6.38, we let B be the ball of radius 1 centered 
at the origin, we obtain 


le * JO) < few GIF — yl dy 
= reg ON fee WOIF@ — yar dy 
= foe OE fycor IF @ — yl dy] ar 
= fXym(Byv(r)r% [m(er By fy coy [fe — y)| dy] dr 


<m(B)[ fry vr)r® dr] f*@). 
The right side is < Cy f*(x) with Cg = CNm(B). Applying Theorem 6.38, 
we see that the operator f > sup,.o9 |(@e * f)(x)| is of weak type (1,1). Since 
¢. * f converges pointwise (and in fact uniformly) to f when f is in Coom(R%), 
the same argument as for Corollary 6.39 shows that lim, jo(@- * f)(x) = f(x) 
almost everywhere for each f in Li (RY). 
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EXAMPLE. An example of a function ¢ as in Corollary 6.42 is P(x) = lax 
x 


in R!. We shall see in Chapter VIII that the function h(x, y) = (Py, * f)(x) for 
this ¢g is the natural function on the half plane y > 0 in R? that is harmonic, ic., 
Ph dh 
as x2 + aye 0, and has boundary value f. Corollary 6.42 says that h(x, y) 
x » 
has f(x) as boundary values almost everywhere if f is in L'(R!). 


7. Fourier Series and the Riesz—Fischer Theorem 


As mentioned at the beginning of Chapter V, the use of the Riemann integral 
imposes some limitations on the subject of Fourier series that no longer apply 
when one uses the Lebesgue integral. In this section we shall redo the elementary 
theory of Fourier series of Section I.10 with the Lebesgue integral in place, with 
particular attention to the improved theorems that we obtain. It will be assumed 
that the reader knows the theory of that section. 

The underlying measure space with be [—z, 2] with the o-algebra of Borel 
sets and with the measure x dx, where dx is 1-dimensional Lebesgue measure. 
The complex-valued functions under consideration will be periodic of period 277, 
thus assuming the same value at z as at —7. The spaces L!, L*, and L® will refer 
to this measure space when no other parameters are given. Since the measure 
of the whole space is finite, these spaces satisfy the inclusions L® C L? € L!. 
The functions in L© being essentially bounded, they are certainly integrable and 
square integrable. The inclusion L? C L! follows from the Schwarz inequality: 
mele iare et eh as)” Gese lax) 

There is another way of viewing this measure space that will be especially 
helpful in relating convolution to the theory. Namely, a periodic function on the 
line of period 27 may be viewed as a function on the unit circle of C with the angle 
as parameter. In fact, convolution is a construction that combines group theory 
with measure theory when the measure is invariant under the group, and that is 
why convolution appears more natural on the circle than on [—z, zr]. The limits 
of integration do not have to be written differently from the way they are written 
on the line, but we must remember that functions are to be extended periodically 
when we interpret integrands. The factor x in front of the measure means that 
all convolutions of functions are to contain this factor. Thus the definition of 
convolution for nonnegative f and g is 


1 a 
(f * g)@) = ~ | f(x — y)g(y) dy. 


Convolution is commutative and associative on the circle just as in Proposition 
6.13, and the various norm estimates of Section 2 are valid in the setting of the 
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circle. The use of dilations has no analog for the circle, and thus the circle has 
no approximate identities of the form ¢,. 
If f is in L, the trigonometric series 


a , 1 i ; 
>, c,e"* with CK=— f(xye7™® dx 
n=—0o 2m Jan 

is called the Fourier series of f. This time we regard the integral as a Lebesgue 
integral. We write 


ee) N 
f@)~ s ce and sn(fix) = Ds Ge. 
n=—N 


n=—C 


A Fourier series can be written also with cosines and sines, and the coefficients 
a, and b, are unchanged from Section I.10. 


Theorem 6.43. Let f be in L?, Among all choices of d_y,...,dy, the 
expression 
1 rs 


20 Jon 


N 42 
f@)- S d,e™| dx 


n=—N 


is minimized uniquely by choosing d,, for all n with |n| < N, to be the Fourier 
coefficient c, = x tes f (x)e7'™ dx. The minimum value is 


1 = 
ae corde — Yo lea 


=—N 


PROOF. The proof is the same as for Theorem 1.53. 


Corollary 6.44 (Bessel’s inequality). If f isin L? with f (x) ~ °°. cne!”™, 


=o VT 

then 

ioe) 1 

2 i 2 
Yi lel} FG de, 
Picante! 20 J_x 
. CO 2; . 

In particular, } >" 4, |en|* is finite. 


PROOF. The proof is the same as for Corollary 1.54. 


Corollary 6.45 (Riemann—Lebesgue Lemma). If f is in L! and has Fourier 


coefficients {c,}P2 _,,, then limyn|_, 49 Cn = 0. 


REMARK. Since L? is properly contained in L', this corollary is not a special 
case of Corollary 6.44, unlike the situation with the Riemann integral. 
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PROOF. The result is immediate from Corollary 6.44 in the case of L* func- 
tions and in particular in the case of continuous functions. Write c,(/) for the 
‘h Fourier coefficient of any function h. Let € > 0 be given. Choose by 
Corollary 6.27a a continuous g with || f — gll1 < €/2. Then choose N such that 
|n| = N implies |cn(g)| < €/2. Then |en(f)| < len(f — g)| + |en(g)| < € since 
len(f — 8) S ae Sy LF — gle |dx = If — sll < €/2. 


Theorem 6.46 (Dini’s test). Let f be in L!, and fix x in [—z, zr]. If there is a 
constant 6 > O such that 


iy If(x +t) — f(x)| dt 
\t|<d 


< OO, 
I¢| 
then limy s,)(f; x) = f(x). 


REMARK. The condition in the corresponding result for the Riemann integral, 
namely Theorem 1.57, was that | f(x + t) — f(x)| < M|t| for |t| < 6 and some 
constant M. The condition in the present theorem is satisfied by f(x) = ./|x| at 
x = 0, and the condition in the earlier theorem is not. 


PROOF. The proof is the same as for Theorem 1.57 except that we need to 
appeal to the improved version of the Riemann—Lebesgue Lemma in Corollary 
6.45. 


Now we work toward a proof of Parseval’s Theorem for all of L?. We need to 
know about Fourier coefficients of convolutions. 


Proposition 6.47. If fa) ~ y ene oo ea? e'* and g(x) ~ eee d,e!"*, then 
Cf * g)() ~ OP? ay Cndne™ 


PROOF. This is a consequence of Fubini’s Theorem and the translation invari- 
ance of Lebesgue measure: 


1 x 
~ | (f x g)(x)e i" dx = — a [ f(x —y)g(y)e7 inx dy| ie 
WT Jn 


—H 


ak YP LF se -neore ax]ay 


=(—) eats © Feene oor dx| dy 


an Fane ™ dx) (= f " gove™ dy). 
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The proof of the version of Parseval’s Theorem for all of L? will make use of 
the Fejér kernel K y(t) introduced in Section I.10. We do not need to recall the 
exact formula for Ky, only the fact that it is a trigonometric polynomial of degree 
N with the following three properties: 


Gi) Ky(x) 2 0, 
Gi) 3 J", Ku@)dx = 1, 
(iii) for any 5 > 0, sups<),)<, K(x) tends to 0 as n tends to infinity. 
These three properties identified Ky as an approximate identity in the setting of 
periodic functions, and Fejér’s Theorem in the form of Theorem 1.59 gave the 
consequence for convergence at points of continuity of f. With the Lebesgue 
integral, we get also results about norm convergence in L! and L?. 


Theorem 6.48 (Fejér’s Theorem). Let f be in L'. Then 


(a) limy ||Ky * f — f ||, = 0 with no additional hypotheses on f, 

(b) limy ||Ky * f — flo = Oif f is also in L?, 

(c) limy (Ky * f)(xo) = f (xo) if f is bounded on [—7, 7] and is continuous 
at xo, 

(d) the convergence in (c) is uniform for xo in E if f is bounded on [—z, x] 
and is uniformly continuous at the points of E, 

(e) limy (Ky * f)(x0) = 4(f @o+) + f o—)) if f is bounded on [—z, 7] 
and has right and left limits f(xo+) and f(xo—) at xo. 


PROOF. For (a) and (b), let p = 1 or p = 2 as appropriate. Then 


[Kn*f — filo 
1 a 
=|= [ kw@tre-9- ronian| by (ii) 
TU J_x p.x 
1 TU 
<5 f KvOIFE=0= FO pxat by (i) and 
Uae, Theorem 5.60 
< 


ey If —t)— f@)llpx +20 sup Ky@IIfllp- 


b<|t\<a 


Given € > 0, choose 5 by Proposition 6.16 to make the first term of the final 
bound be < €/2, and then choose No by (iii) to make the second term of the final 
bound be < €/2 for N > No. Then the final bound is < € for N > No. 

Parts (c) and (d) are proved exactly as in Theorem 1.59. For (e), we may 
assume without loss of generality that x9 = 0 because convolution commutes with 
translations. If we can prove (e) for a single function g with a jump discontinuity 
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at x = 0 equal to the jump for f, then we can apply (c) to f — g and deduce (e) 
for f. Let us see that such a function g may be taken as a multiple of 


h(x) a forO <x <a 
5(—1 — x) for —a7 <x <0. 


In fact, a computation at the beginning of Section I.10 shows explicitly that the 
series pace (sinnx)/n converges to h(x) for x 4 0, but we do not need this fact. 
All that we need is that the series )~°° , (sinnx)/n is the Fourier series of h, a 
fact that we can readily check from the definition. The sum of this series at x = 0 
is manifestly 0, and this sum matches the average of the jumps 5(% + =). The 
Cesaro sums of the series yy (sinnx)/n must have the same limit 0, according 
to Theorem 1.47, and (e) is proved. 


Theorem 6.49 (Parseval’s Theorem). If f is a function in L? with f(x) ~ 
ya ce ythen 


ite =f. Lf) — sv(fixPdx =0 
20m J_x 


N->o 
and 
1. of? 2 = 2 

— dx = : 
in |, Orde = De len 

PROOF. From the first conclusion of Theorem 6.43, we obtain 0 < || f—sy 13 < 
|| f — (Kw * f)||5, and we know from Theorem 6.48b that || f — (Kw * f)||5 tends 
to 0. This proves the first formula, and the second formula follows by passing to 
the limit in the second conclusion of Theorem 6.43. 


Corollary 6.50 (uniqueness theorem). If f is in L! and has all Fourier 
coefficients 0, then f is the 0 element in L!. 


PROOF. Proposition 6.47 shows that the Fourier coefficients of Ky *« f are 
Cn(Kw * F) = Cn(Ky)cen(f), and this is 0 for all n. By Proposition 6.18, Ky « f 
is continuous, and thus Ky * f = 0 by Corollary 1.60. Since Ky * f tends to f 
in L! according to Theorem 6.48a, we conclude that f is the 0 element in L!. 


Now we come to the Riesz—Fischer Theorem, which historically was a great 
triumph for the Lebesgue integral over the Riemann integral. The result uses the 
completeness of L? and has no counterpart with Riemann integration. 


Theorem 6.51 (Riesz—Fischer Theorem). If {cy} is a given doubly infinite 
sequence of complex numbers with )~°° Icn|? < 00, then there exists an f in 


n=—OO 
L” whose Fourier series is )-°° re, 


n=—oo Cn€ 
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PROOF. Define Fy (x) = ye inl<n c,e"*. For M > N, Parseval’s Theorem 
(Theorem 6.49) gives || fu — fn 13 -_ n+i<|al<M |cn|, and the right side tends 
to 0 as M and N tend to infinity because of the convergence of )°. _.. |enl’. 
Thus { fv} is a Cauchy sequence in L*. By Theorem 5.59, L” is complete as a 
metric space, and thus { fy} converges in L*. Let f be (a function representing) 
the limit element in L?. The inner product in L? is a continuous function of the 
L? function in the first variable, and therefore the Fourier coefficients of f satisfy 


en(f) = (fe) = lim (Fivse™). 


As soon as N gets to be > |n|, (fy, e'"*) equals cy. Thus cn(f) = Cn for all n, 
and f has the required properties. 


8. Stieltjes Measures on the Line 


A Stieltjes measure? is a Borel measure on R!. Lebesgue measure dx is 
an example, as is any measure f(x)dx in which f is nonnegative and Borel 
measurable and is integrable on every bounded interval. A completely different 
kind of Stieltjes measure is one that attaches nonnegative weights to countably 
many points in such a way that the sum of the weights in any bounded interval 
is finite. In this section we shall see that the Stieltjes measures stand in one-one 
correspondence with a class of monotone functions on the line that we describe 
shortly. We shall also obtain an integration-by-parts formula in which a Stieltjes 
measure plays the role of the derivative of its corresponding monotone function. 
If a Stieltjes measure jz is given, we associate to jz the function F : R' > R! 
defined by 
R= | —M(x, 0] te < 0, 
wO, x] ifx > 0. 
The function F is called the distribution function of jz. It has the following 
properties:+ 


(i) F is nondecreasing, i.e., is monotone increasing, 

(ii) F is continuous from the right in the sense that F (xo) = lim, jx, F(x) 
for every xo in R!, ie., lim, F(x,) = F(xo) whenever {Xn}a>1 iS a 
sequence tending to xo such that x, > xo foralln > 1, 

(iii) F(O) =0. 


?Many books, this one included, take Stieltjes measures by definition to occur on the line. 
However, there is a theory, albeit a somewhat unsatisfactory one, of “Stieltjes measures” in higher- 
dimensional Euclidean space. It is of interest chiefly in probability theory. 

3An alternative definition says F(x) equals —[x,0) and ,[0, x) in the two cases, and then 
property (ii) says that F is continuous from the left. The choice made here between these alternatives 
is governed by keeping technicalities to a minimum in Section 10. 
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Properties (i) and (iii) are immediate from the definition. With (11), there are two 
cases according as the limit x9 is < 0 or > O, and both cases are settled by the 
complete additivity of ju. 

The measure jz is completely determined by its distribution function F. In 
fact, the definition of F forces w((a, b]) = F(b) — F(a), and Proposition 6.6 
implies that jz is determined as a Borel measure by this formula. 


Theorem 6.52. The Stieltjes measures jz stand in one-one correspondence 
with the functions F : R' — R! satisfying (i), (ii), and (iii), the correspondence 
being that F is the distribution function of ju. 


PROOF. We have seen that each yz leads to an F and that F uniquely determines 
jt. We need to see that every F satisfying (i), (ii), and (iii) arises from some j. 
If such a function F is given, we define a set function jz on bounded intervals by 


iL((a, b]) = F(b) — Fa), 
p((a, b)) = lim F(b — 5) — F(a), 


p([a, b]) = F(b) — lim F(a — 5), 
u([a, b)) = lim F(b — +) — lim Fa — 3). 


We extend yu to the ring R of elementary subsets of R', ie., the ring of all finite 
disjoint unions of bounded intervals, by setting jz of a finite disjoint union of 
bounded intervals equal to the sum of the values of jz on each of the intervals, 
just as with Lebesgue measure in Example 4 at the end of Section V.1. 

To see that jz is unambiguously defined and is additive on R, we readily 
reduce matters, just as with Lebesgue measure, to showing that if an interval is 
decomposed into the union of two smaller intervals, then jz of the union is the 
sum of jz of the components. Thus let a < b < c, and let an interval J from a 
to c be the union of an interval from a to b and an interval from b to c. If the 
interval J from a to c is (a, c), then the two possible cases are handled by 


1((a, b)) + w([b, c)) = lim F(b — 5) — F@) + lim F(c — ;) — lim F(b — 5) 


= wa, c)) 


and 
H((a, b)) + w(O, c)) = F(b) — F(@) + lim F(c — 1) — F(b) = u(G,c)). 


If the interval J froma toc has one or both endpoints present, then the computation 
is the same except that F(a) is replaced by lim, F(a — 1) if a is in J and 
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lim, F(c — 1) is replaced by F(c) ifc isin J. Thus yz is unambiguously defined, 
and it is nonnegative and additive. 

The next step is to prove, just as with Lebesgue measure in Section V.1, that 
pe is regular on R in the sense that for each F in R and € > O, we can find a 
compact K in 7 and an open U in such that K C FE CU,m(K) > m(E) —e, 
and m(U) < m(E) + €. As with Lebesgue measure, the proof comes down to 
the case that E is a single interval, and this time there are four subcases. Choose 
n large enough so that 2 < €, and then 


for [a,b), take K =[a,b—+] and U=(a—},)b), 
for [a,b], take K =[a,b] and U = (a—+,b+14), 
for (a,b], take K =[a++,b] and U=(a,b+4), 


for (a,b), take K =[a++,b—+] and U= a,b). 


An exception occurs in the definition of K if the listed left endpoint of K exceeds 
the listed right endpoint of K, and then K is defined to be empty. Each of these 
definitions contains a parameter n; if we write K, and U,, for the corresponding 
sets K and U, then we can check from the definitions and property (ii) of the 
function F that lim, w(K,) = WCE) and lim, w(U,) = w(E). The regularity 
condition for E follows from these limit relations. 

The next step is to prove that jz is completely additive on 7 by imitating 
the proof for Lebesgue measure. In fact, the proof of Proposition 5.4 applies 
word-for-word except that m has to be changed to yw throughout and the word 
“proposition” in the last sentence of the proof should be changed to “complete 
additivity.” 

Then jz extends to a measure on the Borel sets by Theorem 5.5. The extended 
measure is o-finite on R! because R! is the countable union of bounded intervals 
and yz is finite on every bounded interval. 

Finally we need to show that the distribution function G of jz is equal to F. 
Our definitions make 


G(x) = —pL((x, OJ) = —(F (0) — F(x)) = FO) ifx <0, 
Oe ike Ohm ifx > 0. 
Thus G =F. 

EXAMPLES. 


(1) Let F be any continuous distribution function that has a continuous de- 
rivative f except possibly at finitely many points. If x is a point of conti- 
nuity of f, then the Fundamental Theorem of Calculus (Theorem 1.32) gives 
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ff, f@)dt = f(x). Put 


0 7 
om={ oh f(t)dt ifx <0, 
fo fat ifx > 0. 


Then G is acontinuous distribution function, and the formula for the derivative of 
the integral shows that F’(x) = G’(x) except at finitely many points. Recursive 
application of the Mean Value Theorem, starting from x = 0, to F — Gon 
intervals having F’ — G’ = 0 in their interiors, shows that F = G everywhere. 
The Stieltjes measure jz associated to F’, by the uniqueness in Theorem 6.52, is 
given by 


we) = | f(t) dt. 
E 


The special case with F(x) identically equal to x has f identically equal to 1, 
and the measure is just Lebesgue measure. 


(2) The function F' with 


0 for x > 0, 

F(x)= 

—1 for x < 0, 
has the three properties of a distribution function, and the associated measure jz 
is a point mass assigning weight 1 to x = 0. The measure p takes the value 1 
on every Borel set containing 0 and takes the value 0 on every Borel set not 
containing 0. This measure is sometimes called the delta measure at 0 or “delta 
mass” at 0. Whenever a Stieltjes measure v has v({p}) > 0 for some p in R', 
we Say that v contains a point mass at p of weight v({p}). Then v is the sum of 
a point mass at p of weight v({p}) and a Stieltjes measure containing no point 
mass at p. 


(3) Let {x,} be a sequence in R. For example, {x,} could be an enumeration 
of the rationals. Let {w,} be a sequence of positive numbers with )° w, < ov, 


and define 
Wn for x > 0, 


{1 |0<x,<x} 
F(x) = 
— Wn for x <0. 
{n | x <x, <0} 
Then F satisfies (i), (ii), and (iii), and hence F is the distribution function of some 
Stieltjes measure jz. The measure is given by 


wa, b= YP wn. 


{n |a<xn<b} 


It is a countable sum of point masses. The function F’, though monotone increas- 
ing, is discontinuous at every x,, and this set is allowed to be dense. 
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(4) This example will be a nonzero Stieltjes measure that is carried on a Borel 
set of Lebesgue measure 0 and yet has a continuous distribution function. We start 
from the standard Cantor set C in [0, 1] described in detail in Section II.9. This set 
is compact and is obtained as the intersection of a sequence {C,,} of sets with each 
C, consisting of the finite union of closed bounded intervals. The set Co is [0, 1], 
and C,4+1 is obtained from C,, by removing the open middle third of each of the 


Fi Fy 
1 1 
0.8 0.8 
0.6 0.6 
0.4 0.4 
0.2 0.2 
0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1 
F3 F4 
1 1 
0.8 0.8 
0.6 0.6 
0.4 0.4 
0.2 0.2 


FIGURE 6.1. Construction of a Cantor function F. Graphs of 
approximations F, F2, F3, Fy to F. 


constituent closed intervals of C,,. The Lebesgue measure of C,, is (2/3)”, and 
thus C has Lebesgue measure lim, (2/3)” = 0. The measure jz we construct will 
have w(C) = 1 and w(C°) = 0, yet it will assign 0 measure to every one-point 
set. The properties that are needed of the corresponding distribution function F 
so that yz has these properties are that F is continuous, F is 0 for x < 0, F is 1 
for x > 1, and F is constant on every open interval J of [0, 1] — C,i.e., on every 
open interval of every [0,1] — C,. This condition will make w~(/) = 0 for all 
such J. Since the metric space [0, 1] — C has a countable base, it is the union of 
countably many such open intervals 7, and thus jz([0, 1] — C) = 0. Since F is 
constant for x < 0 and for x > 1, wis O on (—oo, 0) and (1, +00) as well, and 
thus 4(C*°) = 0. To obtain the distribution function F’, we construct a sequence 
of approximating functions F;, and show that the sequence is uniformly Cauchy. 
The set C6 [0, 1] is the union of 2” — 1 disjoint open intervals. On the k™ such 
interval we define F, to be k2~”. We let F,(x) = 0 for x < 0 and F(x) = 1 
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for x > 1. On the complementary closed intervals, define F,, in any fashion that 
makes F,, monotone increasing and continuous. Graphs of F; through Fy are 
shown in Figure 6.1 with the interpolation in the graphs done by straight lines. 
The result is that 


|Fn(X) — Fnoe@)| = 2™ 


for all x. Hence {F,,} is uniformly Cauchy and therefore uniformly convergent. 
Let F be the limit function. The function F continuous by Theorem 1.21, and 
it is monotone increasing, satisfies F(0) = 0, and is constant on every open 
interval contained in C°. According to Problem 15 at the end of the chapter, it 
is independent of the method of interpolation used in constructing the F,,’s. The 
function F is called the Cantor function corresponding to the standard Cantor 
set. 


The most general monotone increasing function F on R! is not far from 
being the distribution function of some Stieltjes measure. In the first place the 
monotonicity of F implies that F has left and right limits at every point, and 
consequently its only discontinuities are jumps. There can be only countably 
many such jumps: in fact, if there were uncountably many jump discontinuities, 
there would be uncountably many in some bounded interval, and that interval 
would contain uncountably many of magnitude at least 1/n for some integer n; 
hence F would have to be unbounded on that bounded interval. Let us define 
a function F; by F\(x) = lim,|, F(t). This is well defined, since F has right 
limits at every point, and we have F(x) = F(x) except on a countable set. If we 
define Fo(x) = Fi(x) — Fi (0), then F? satisfies the three defining properties (1), 
(ii), (11) of a distribution function. If ~ is the Stieltjes measure corresponding to 
Fy under Theorem 6.52, we call jz the associated Stieltjes measure for F'. 


Theorem 6.53 (integration by parts). Let a < b, let F be a monotone increas- 
ing function on R! that is continuous from the right at a and b, and let jz be the 
associated Stieltjes measure. If G is aC! complex-valued function on [a, b] with 
derivative g, then 


b 
/ F(x)g(x) dx = G(b)F(b) — G(a)F (a) — if Gdu. 
a (a,b] 


PROOF. Without loss of generality, we may assume that G is real-valued. Let 
F> be the distribution function of jz. By construction of F>, there is a constant c 
such that F — Fy = c except possibly at points of discontinuity of F’, and the set 
S of such points S within [a, b] is countable. This exceptional countable set S 
does not contain a or b, since F and F> are continuous from the right at a and b. 
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We have 


[? (F - Fi)gdx = [,(F — Fr)gdx + f? cg(x) dx 
=c J’ g(x) dx =c(G(b) — G(a)) 


and also 
G(b)(F (b)— F2(b))-G@)(F @)— F2(a)) = G(b)e—G(a)e = c(G(b)—Ga)). 
Thus 

[? (F = Fy)g dx = G(b)(F b) — Fx(b)) — G(a)(F (@) — F(a). 


Comparing this formula with the formula in the statement of the theorem, we see 
that if the theorem holds for F2, then it holds for F. Changing notation, we may 
therefore assume that F is the distribution function of jw. 

Let P be a partition a = x9 < x1 < +++ < Xn-1 < X, = D of [a,b] with 
mesh to be specified. For 1 < i <n, we use the Mean Value Theorem to choose 
ti € (xj-1, Xj) with G(x;) — G(x%j-1) = g(t) (4; — xj-1). We can do so since we 
have assumed that G is real valued. Then we have 


F(xi)g (ti) Qi — Xi-1) = FOiI)GAi) — FOi)Gi-1) 


and 
d F (xj) g (ti) («i — Xi-1) 


= F(n)G(n) + YS FO}-GOi1) — YO FODGC-) 

= F (Xn)G(n) — F (%0)G (0) — Lo GOi-)(F) — FO) 
i=1 

= F(b)G(b) — Fa)G(a) — YGF) — Fi): 


i=1 


We shall show for small enough mesh that ae F (x;)g (t;) (4%; — x;-1) is close to 
ie F (x)g(x) dx and that )°"_, G(x;_-1)(F (x) — F (4j-1)) is close to Seas Gdu. 
Let M be an upper bound for |g| and |F'| on [a, b], and let € > 0 be given. 
Choose a number 6 > 0 by uniform continuity of G and g such that |x — x’| < 6 
implies |G(x) — G(x’)| < €/M and [g(x) — g(x’)| < «€/(M(b — a)), as well as 
another condition to be specified. If the mesh of the partition is < 6, then 
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YG FC) = FR-D) ~ fa dul 


=|¥ fey yx lGOi) — G@)du 
i=l 
<¥ Jog. yu !@Gi-) ~ GOI da 


=D Sosaale/M) du 


= (€/M)(F(b) — F(a)) 
< 2€ since |F| < M. 


Also, 


YF ag (ido — 1) — f? F@)g(x) de| 
i=l 


Y Sossavy (FO) — 8@)) + FH) — FO))8) dx| 
<> fy, EGDI IgG) — s@Mldx+ DY fi, (EOD — F@IIg@ldx 
i=l i=1 


SY Moat M(e/MO —a))) dx + YPC) = FOi)IMS 


=e+(F(b) — F(a))M6 by monotonicity of F 
<€+2M"S. 


Thus if 5 satisfies the additional condition that 6 < €/(2M7), then the absolute 
value of the difference of the two sides in the formula of the theorem is < 2€+2€ = 
4e. This completes the proof. 


9. Fourier Series and the Dirichlet—Jordan Theorem 


A real-valued function f on a bounded interval [a, b] is said to be of bounded 
variation on [a, b] if there is a constant M such that every partition 


Pr: @=X9 <X1 <+++ <Xn_-1 <X, =D 


sup) If (xi) — f@i-VI < M. 
i=1 
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Let us write || {|| ,, for the least M such that this inequality holds. The set of 
functions f of bounded variation on [a, b] is a pseudo normed linear space in the 
sense of Section V.9, the pseudonorm being || - || ,,. The only functions f with 
| f lay = 0 are the constants. 

Examples of functions of bounded variation are furnished by arbitrary bounded 
monotone functions and by any function with a continuous derivative. In fact, 
if f is monotone increasing, then f is of bounded variation with || f|lpy = 
f(b) — f(a). If f has f’ continuous, then the Mean Value Theorem gives 


YS f@d)-f@Ul= IP @i@i-21) witha <4 < x; 
i=1 i=1 


<If'llsup > @i — x1) 
i=1 
= IF sup 7 a), 


and we see that f is of bounded variation with || f Ilpy < II fIlup( — @). 

Let us associate two functions on [a, b] to f if f is of bounded variation. For a 
real number r, define rt = max{r,0} andr~ = — min{r, 0}, so thatr = rt 
and |r| =r+ +r—. The functions are the positive and negative variations of f , 
given by 


—r_ 


n 


V*(N@) = sup) (Fai) - fain), 
with xy=a j=] 
and x,=x 


n 


VAG) =~ sup Di (fei) — fev) 
with xy=a j=] 
and x,=x 


the supremum in each case being taken over all partitions of [a, x]. 


Proposition 6.54. If f is of bounded variation on [a, b], then V*t(f) and 
V~(f) are monotone increasing functions such that 


fF) = f@+V*(A)@)-— VA) 


for all x in [a, b]. In particular, f is the difference of two monotone increasing 
functions. 


REMARK. Since monotone functions have left and right limits at each point, 
it follows that every function f of bounded variation has left and right limits at 
each point. We denote these by f(x—) and f(x+), respectively. The function f 
is continuous from the left at x if and only if f(~—) = f(x), and it is continuous 
from the right at x if and only if f(x) = f(x+). 
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PROOF. It is evident from the definitions that V*(f) and V~ (f) are monotone 
increasing. Fix x, and let P be a partition of [a, x]. Then 


n 


FQAfO=> FR)=FGa)) 
i=1 


=)" (fi) — fA)" — 0 (F@D — F@i-D) 
i=l 


i=1 


Hence 


n 


> (Fi) — Fi)" = >> (Fa) — F@i-D)” + (F@) - FO) 


i=l i=1 


<V-()G@)+(f@)-—s@), 


and 
VI(f)@) < V-(A)@) + (f@) — F@). 
Also, 
>) (Fi) — Fi) = >> (Fa) — Fi)" - (F@) - FO) 
i=1 i=1 
<V*(f)@)— (Ff) - f@), 
and 
V-(f)(x) < VIA) - (F@) - F@). 
Therefore 


f(x) — f@ = V*(f)@) — V(A)@), 


and the proof is complete. 


Theorem 6.55 (Dirichlet-Jordan Theorem). If f is a function of bounded 
variation on [—z, z], then the Fourier series of f converges at each point to 
$( fa3+f (x+)) and it converges uniformly to f(x) on any compact set on 
which the periodic extension of f is continuous. 


By way of preparation, it will be convenient to extend the definition of Fourier 
series to allow integrable functions to be replaced by more general Borel measures. 
If is a Borel measure on [—z, 7], we want to be able to regard yu as periodic. 
One way to proceed would be to insist that jz really be a measure on the circle 
group, hence be defined on (—z, 1]. Alternatively, we could insist that any point 
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mass contributing to 44 at —z be matched by an equal point mass for py at zr. 
A way of avoiding point masses contributing at the endpoints is to change the 
interval [—yr, 7] toa suitable [c — 2, c++]; we can find a number c with no point 
masses at the ends because only countably many point masses can contribute to 
and still have yz be a finite measure. In any event, we define the Fourier—Stieltjes 
series of jz to be the series 


lee) 
3 ce with Cn -| e™* du(x). 


n=—0o (—1,7] 
The usual factor of + is dropped because we identify an integrable function f > 0 
with the measure x f dx when making the generalization. From the definition 
of the Fourier—Stieltjes coefficients, we see immediately that |c,| < w((—z, J); 
hence the coefficients are bounded. 


PROOF OF THEOREM 6.55. We take the given function f to be periodic of 
period 277. On some closed interval [a, b] containing [—7, 7] in its interior, let 
us decompose f according to Proposition 6.54 as f = f(a) + V*(f)—-V(f). 
It is then enough to prove the theorem for the monotone increasing functions 
f(a) + V*(f) and V~(f) separately. These functions need to be extended to 
all of IR!, and we may make that extension by taking them to be constant to the 
left of [a, b] and to the right of [a, b]. 

Changing notation in the theorem, we may assume from the outset that f is 
monotone and bounded, though no longer periodic. Neither the Fourier coeffi- 
cients of f nor the hoped-for values of the sum of the Fourier series are changed 
if we adjust f on a subset of the countable set where f is discontinuous. Thus 
we may assume without loss of generality that f is continuous from the right at 
every point. Let f(x) ~ )°-° _., cne’”* be its Fourier series. 

Let jz be the Stieltjes measure associated to f. Applying integration by parts 
(Theorem 6.53) on the interval [—z, 2] with G(x) = e~'”* and g(x) = —ine~'", 
we obtain 


J™, F(inye™ dx = eH" f(r) — eM F(—m) = f_, ge dua). 


The left side is —2zinc,, and the right side is the sum of two bounded terms 
and the negative of a Fourier—Stieltjes coefficient of 4. These Fourier—Stieltjes 
coefficients are bounded, and hence |c,| < C/|n| for some constant C. 

Let sy(x) = yo ncne’”™* be the N" partial sum of the Fourier series 
of f, and let o,(x) = we eee 5,(x) be the N™ Cesaro sum. We know 
that oy(x) = (Kw * f)(x), where Ky is the Fejér kernel. Fejér’s Theorem 
(Theorem 6.48) shows that limy oy(x) = +( fa-)y)+f (x+)) for all x and 
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that lim, o,(x) = f(x) uniformly on any compact set of points where f 
is continuous. The Tauberian theorem stated as Proposition 1.50 allows us to 
conclude that sj (x) converges and has the same limit as o,, (x) if it is shown that 
the sequence n(c,e!”* + c_,e'”*) is bounded for n > 0. But this boundedness 
is immediate from the estimate |c,| < C/|n| for the Fourier coefficients of f. 


10. Distribution Functions 


This section concerns the computation of integrals. A measure space (X, A, p) 
will be fixed throughout. A need to estimate integrals arises in two quite dis- 
tinct situations, and the emphasis is different for the two situations. One is in 
connection with problems in Fourier analysis and differential equations, and the 
underlying measure space typically has X equal to R, A equal to the o-algebra 
of Borel sets, and p equal to Lebesgue measure. The other is in connection with 
probability theory, and the underlying measure space is typically a complicated 
space with p(X) = 1. Although the word “distribution” acquires multiple 
meanings in the process, the theory can begin at the same point in the two cases. 

Let f : X — R be a measurable function. We define a measure wy on the 
Borel sets of R and a function Ay : (0, +00) — [0, +00] by 


up(E) = p(f'(E)) = p({x € X| f(x) € E}) foreach Borel set E, 
Ap) = p(IfI'(&, +00))) = p({x € X | |f@)| > &}). 


Proposition 6.56. If f : X — Ris a measurable function, then 
(a) fy ®(f(x)) dp(x) = fp P(t) du (t) for every nonnegative Borel mea- 
surable function ® : R > R, 


(b) fy US ())) dp(x) = i Ap (E)p(E) dé whenever —(€) dé is a Stieltjes 
measure on R! and © is its distribution function. 


PROOF. In (a), when ® is an indicator function /z, the two sides of the identity 
are p(f—'(E)) and bf (E), and these are equal by definition of wr. We can 
pass to nonnegative simple functions by linearity and then to general nonnegative 
Borel measurable functions ® by monotone convergence. 

In (b), when f is a nonnegative simple function s, let s = ee clr, be the 
canonical expansion of s as a linear combination of indicator functions, with the 
c;’s arranged so that cy > cp > +++ > Cy = O. Put c,+; = 0. Then we have 


Io’ ape) dé = Why S o( Uj Ej) @E) dé 


= Pie o( Uj: EP (x) — OC) 
= he Dja1 PCEN® (ce) — O(Ce+1)] 
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= ae ae P(E; )[P (cx) — P(cx41)] 
= V1 p(E\/)P(cj) since BC) = 0 
= Vj=i Sg, PO) do) 

= fy P(s(x)) dp(x). 


This proves (b) for nonnegative simple functions f. For a general measurable 
| f| on X, choose an increasing sequence of nonnegative simple functions s, 
with pointwise limit | f|. The definition of ® in terms of g makes ® monotone 
increasing and continuous, and thus ®(s,,(x)) increases to ®(| f (x)|). Also, the 
set {x e X | lf(x~)| > é b for each fixed &, is the increasing union of the sets 
{x € x | Sy(x) > é}, and thus A, (€) = p({x € x | Sy(x) > é}) increases to 
Af (&) = p({x ex | | f(x)| > é}) for each €. Hence we can pass to the limit in 
the identity for each s, and obtain the identity for | f| by monotone convergence. 
This proves (b) for a general measurable | f|. 


For applications to Fourier analysis and differential equations, it is (b) that is 
important, and the function ® of most interest is ®(t) = t? withO < p < +o. 
The formula in this case is 


[ ireorrdocs =p | aj (B)eP dé. 
x 


Somewhat unfortunately, the function 4+ is called the distribution function of 
J; the term does not conflict with the notion of the “distribution function” of a 
Stieltjes measure as long as one does not make any associations between functions 
and measures. 

A special case of the displayed formula is that X is R’ , p is Lebesgue measure, 
and p is 1. In this case the formula simplifies to fay | f(x)|dx = 5° Ay (€) dé, a 
formula that was mentioned after the statement of the Hardy—Littlewood Maximal 
Theorem (Theorem 6.38). 

The displayed formula shows that /', | f|? do can be computed from the func- 
tion A, and it is apparent that the integral cannot be finite if 1. (€) is everywhere 
> some positive multiple of &~?. This observation can be improved upon without 
the aid of Proposition 6.56 in the following way. We have af y lf)? do(x) = 


fevex' rooney LF OI? do(x) = EPo({x € X | | f(x)| > €}. Thus we obtain 


e Sxl fl? dp 


p({xe X| fl > §} ss 


an inequality that goes under the name Chebyshev’s inequality. 


352 


VI. Measure Theory for Euclidean Space 


11. Problems 


Let S! be the unit circle of C, let T be the subgroup of elements of finite order, 
and let E be a subset of S! that contains exactly one element of each coset in 
S'/T. (Such a set E exists by the Axiom of Choice.) Prove that E is not a 
Lebesgue measurable subset of the circle and therefore that the corresponding 
subset of (—z, 27] is not Lebesgue measurable on R!. 


Let g be the mapping given explicitly in Section 5 that allows one to substitute 
in an expression in Cartesian coordinates and obtain an expression in spherical 
coordinates. Let U be the domain of g. Prove that 

(a) the determinant factor in the change-of-variables formula is given by 


|dety’| =r%—! sin”-? 6 sinY—3 6 --- sin @y_2, 


(b) gis one-one on U, 
(c) the complement of g(U) in R% is a lower-dimensional set. 


Let L be a nonsingular N-by-N real matrix. Prove that 


[fear =laerr [ f(x) dx 
RV RY 


for every nonnegative Borel measurable function /. 


Let My denote the N*-dimensional Euclidean space of all real N-by-N matrices, 
and let dx refer to its Lebesgue measure. Prove that 


| S09 ew ae | SO near 


for each nonsingular matrix y and Borel measurable function f > 0. In the 
formula, yx is the matrix product of y and x. 


Fix a with 0 < a < 1. Suppose f : R — C is periodic of period 27, is 

smooth except at multiples of 27r, and satisfies the inequalities | f(x)| < C|x|®, 

[f’@)| < Clx|e", and | f(x) | < Clx|*~? for |x| < 1. 

(a) By breaking the integral at |x| = 1/|n|, prove that the Fourier coefficients 
c, of f satisfy |c,| < K/\n|!**. 

(b) How can one conclude from (a) that the Fourier series of f converges 
uniformly? Why is the limit equal to f? 

(c) Prove or disprove: The real and imaginary parts of the function f are of 
bounded variation on every bounded interval. 


Let be a nonzero measure on the o-algebra of all subsets of R! assigning to 
each set either measure 0 or measure 1. Prove that jz is a point mass. 
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Determine all Stieltjes measures v ¥ 0 on the line with 


dt feav=(f rav)( fea») 


for all continuous nonnegative functions f and g. 


Problems 8—10 make use of Fubini’s Theorem in unexpected ways. 


8. 


10. 


(a) Show that the complement of any Lebesgue measurable set of Lebesgue 
measure 0 in RY is dense. 

(b) Let wu be a Stieltjes measure on the line, and let E be a Borel set in R! 
with Lebesgue measure 0. Prove that w(£ +t) = 0 for almost every t with 
respect to Lebesgue measure. 

(c) Suppose that a Stieltjes measure yu on the line satisfies lim;.9 u(E +t) = 
i(E) for each bounded Borel set E in R!. Prove that LL(E) = 0 for every 
Borel set E of Lebesgue measure 0. 


In potential theory a positive charge on R? is by definition any finite Borel measure 
e C . Prove that the potential 


is finite almost everywhere with respect to Lebesgue measure. 


4, and its potential h is the function h(x) = (= 


Let P(x1,...,%X,) be a real-valued polynomial on R” that is not identically 0. 
Prove by induction that the set in R” where P = 0 has Lebesgue measure 0. 


Problems 11-14 concern the gamma function and some associated changes of vari- 
ables. 


11. 


12. 


13. 


Prove that 


1 
0 TPawt+y) 


by starting from the product of [ (x+y) and the left side, substituting for "(x+y), 
making a change of variables, using Fubini’s Theorem, and making another 
change of variables. 


By evaluating the integral tan el dx first in Cartesian coordinates by means 
of Proposition 6.33 and then in spherical coordinates by means of the change- 
of-variables formula for multiple integrals, obtain an expression for the area 
Qy-1 = fees do of the sphere Syl Express the answer in terms of a value of 
the gamma function. 


Let I be the “cube” of all u = (uj,..., u,) in R” withO < u; < 1 for alli, and 
let S be the “simplex” of all x = (x1,...,%,) in R” with x; > 0 for all i and 
yr 4 < 1. Define x = g(u) by 
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X1 =U), 


x2 = (1 —44)u2, 


Xn = qd Uy): ¢l ae Un—|)Un- 


(a) Prove that }*"_,x; = 1—J]j/_,d— wi). 
(b) Prove that g maps J one-one onto S, with inverse given by 
Xj 


uj; = 5 
Leo eae 


(c) Prove that | det g’(u)| = (1 — w1)"~!(1 — u2)"~2--- (1 — u,_1) and that 


| det(o™")’(x)| = (1. — 1) — 41 — 2) at eT. 


14. Using Problems 11 and 13, prove for the simplex S in Problem 13 that 


i aj—1.a—-1 An—1 T(a,)U(a2)---T@) 
Xp) X> Pn AK 
S Pq +-:-+a,+)) 


when a; > 0 for all j. 


Problems 15—17 concern the Cantor function for the standard Cantor set. 


15. Prove that the values of the Cantor function F for the standard Cantor set 
are independent of the method of defining the approximating functions F,, on 
the complementary closed intervals as long as F,, is monotone increasing and 
continuous. 


16. Compute i F (x) dx if F is the Cantor function for the standard Cantor set. 


17. The Stieltjes measure jz corresponding to the Cantor function for the standard 
Cantor set C is called the Cantor measure. The set C consists of the members 
of [0, 1] that can be expanded in the digits 0, 1, 2 of base 3 without using any 1’s. 
Show, for each n-tuple of 0’s and 2’s, that 4 attaches measure 2~” to the subset 
of C whose base 3 expansion begins with that n-tuple. 


Problems 18—20 introduce the Poisson integral formula for the unit disk in R*. The 
Poisson kernel was the subject of Problems 27-29 at the end of Chapter I and is given 
by 


(oe) 


P= are 


n=—oCo 


1—r2 


1—2rcos6+r2- 
Harmonic functions in the unit disk were the subject of Problems 14—15 at the end 


of Chapter III and also Problems 10-13 at the end of Chapter IV. The present set of 
problems begins to relate the Poisson kernel to harmonic functions via convolution. 


18. 


19. 


20. 
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If f isin L'((—z, 2], + d@), then the Poisson integral of f is the function in 
the unit disk defined in polar coordinates by 


1 rs 
HG, 6) = 5 / FO)P-(0 - 9) do. 


If cy is the n™ Fourier coefficient of f, prove that u(r, 0) = °° car ein®, 
and conclude that u is harmonic in the open unit disk. 


If p equals | or 2 and if f is in L?([-2, rr), a dé), prove that the Poisson 
integral u(r, 0) of f has the properties that ||u(r, -)||, < fll, forO <r <1 
and that u(r, -) tends to f in L? in the sense that lim,» |lu(r, +) — fll, = 9. 


Suppose that f is in L® ([-x, rr], + d0) and that u(r, @) is the Poisson integral 

of f. 

(a) Prove that lim,;; u(r,@) = f(@) uniformly on any set of 0’s where f is 
uniformly continuous. 

(b) For f of class C”, prove that the Poisson integral of f is the only harmonic 
function u(r, @) in the disk such that lim,+; u(r, 9) = f(@) uniformly in 0. 

(c) Prove that u(r, -) tends to f weak-star in L™ relative to L! in the sense that 
lim,+4 (he u(r, 0)g(@)dé0 = yee f (0)g(0) dé for all g in L'. (Weak-star 
convergence was defined in Section V.9.) 


Problems 21-25 concern functions of bounded variation. For such a function f, the 
positive and negative variations of f were defined in Section 9, and their values at x 
were denoted by V*(f)(x) and V~(f)(x). 


21. 


22. 


23. 


24. 


Prove that the product of two functions of bounded variation on [a, b] is of 
bounded variation. 


This problem concerns a certain minimality property of the decomposition 
f(x) = f@+Vt(f)@) — V~(f)() of a function f of bounded variation on 
[a, b]. Prove that if g; and go are any two nonnegative monotone increasing func- 
tions such that f(x) = f(a) + g1(x) — g2(x) forall x, then VT(f)(x) < gi (x) 
and V~(f)(x) S g2(x). 


Prove that if f is of bounded variation on [a, b] and is continuous at a point x in 
(a, b), then both V*(f) and V~(f) are continuous at x. 


If f is of bounded variation on [a, b], define the total variation of f as the 
function given by 
VA\@) = sup >-|f@i)— fGi-1) 


P with xo=a j=] 
and x,=x 


’ 


the supremum being taken over all partitions of [a,x]. Prove that V(f)(x) = 
V*(f)@) + V~(f)(@) for alll x. 
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25. Prove that the function f on [—1, 1] given by 
x sin(1/x) for x £0, 
0 for x = 0, 


is not of bounded variation. Prove or disprove that the function g on [—1, 1] 
given by 


je =| 


x? sin(1/x) for x £0, 


eo ={ for x = 0, 


is of bounded variation. 


CHAPTER VII 


Differentiation of Lebesgue Integrals on the Line 


Abstract. This chapter concerns the Fundamental Theorem of Calculus for the Lebesgue integral, 
viewed from Lebesgue’s perspective but slightly updated. 

Section 1 contains Lebesgue’s main tool, a theorem saying that monotone functions on the line 
are differentiable almost everywhere. A relatively easy consequence is Fubini’s theorem that an 
absolutely convergent series of monotone increasing functions may be differentiated term by term. 
The result that the indefinite integral [* f(t) dt of a locally integrable function f is differentiable 
almost everywhere with derivative f follows readily. 

Section 2 addresses the converse question of what functions F have the property for a particular f 
that the integral [’ J (t) dt can be evaluated as F (b) — F (a) for alla and b. The development involves 
a decomposition theorem for monotone increasing functions and a corresponding decomposition 
theorem for Stieltjes measures. The answer to the converse question when f > 0 and F’ = f 
almost everywhere is that F is “absolutely continuous” in a sense defined in the section. 


1. Differentiation of Monotone Functions 


The generalization of the Fundamental Theorem of Calculus to the Lebesgue 
integral was the crowning achievement of Lebesgue’s book. We have already 
stated and proved a particular result in that direction as Corollary 6.40, using a 
more recent method that is of continual applicability in analysis. The statement 
of the part of the Fundamental Theorem in that corollary is that is f(t) dt is 
differentiable almost everywhere with derivative f(x) if f is a Borel function on 
the line that is integrable on every bounded interval. 

In this chapter we shall develop that and allied results using something closer to 
Lebesgue’s original method. These allied results are chiefly of historical interest, 
no longer being of great importance as analytic tools. However, their beauty 
is undeniable and by itself justifies their inclusion in this book. In addition, 
these allied results motivate some results in Chapter IX, particularly the Radon— 
Nikodym Theorem, that might seem strange indeed if the historical background 
were omitted. 

The starting point is the almost-everywhere differentiability of monotone func- 
tions on the line, given in Theorem 7.2 below. Since monotone functions include 
the distribution functions of Stieltjes measures, this differentiability shows at 
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once that functions of the form ie f(t) dt with f > 0 are differentiable almost 
everywhere, and then we are well on our way toward a more traditional proof of the 
Fundamental Theorem for the Lebesgue integral. The advantage of starting with 
all monotone functions is that one can address at the same time the differentiability 
of all distribution functions of Stieltjes measures, not just those of measures 
f(t) dt. From this fact one can attack the question of how close the derivative 
f (£) is to determining the function of which it is the derivative almost everywhere. 
This is the second aspect of the traditional Fundamental Theorem of Calculus as 
in Theorem 1.32: for the case of continuous f , any two functions with derivative 
f everywhere on an interval differ by a constant. 

There is a certain formal similarity between the theory of differentiation of 
monotone functions and the theory of the Hardy—Littlewood Maximal Theorem 
as in Chapter VI. Wiener’s Covering Lemma captured the geometric core of the 
theorem in Chapter VI, and another covering lemma captures the geometric core 
here. This is the Rising Sun Lemma, which will be given as Lemma 7.1. 

By way of preliminaries, any open subset U of R! is uniquely the union of 
countably many disjoint open intervals, the open interval containing a point x in 
U being the union of all connected subsets of U containing x. These sets give 
the required decomposition of U by Propositions 2.48 and 2.51. An open subset 
of an interval (a, b) is necessarily open in R!, and hence it too is uniquely the 
countable union of disjoint open intervals. 


Lemma 7.1 (Rising Sun Lemma).! Let g : [a,b] — R be continuous, and 
define 


E = {x € (a,b) | there exists € € (a, b) with € > x and g() > g(x)}. 


The set E is open in (a, b). If E is written as the disjoint union of open intervals 
with endpoints a; and b;, then g(a,) < g(bx) for each k. 


FIGURE 7.1. Rising Sun Lemma. Graph showing three open intervals 
produced by the lemma. 


Some authors call this result Riesz’s Lemma. 
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REMARK. The Rising Sun Lemma is so named because of the situation in 
Figure 7.1. The sun rises in the east, viewed as the direction of the positive x 
axis. It casts shadows within the graph of g, and the content of the lemma is the 
nature of those shadows. Although the conclusion of the lemma is that g(ax,) < 
g(b;) for all k, the reader can observe in the figure that g(a.) = g(bx) for the 
open intervals that are shown. This observation is valid in general except possibly 
when a,x = a, but the observation is not needed in the proof of Theorem 7.2 below. 


PROOF. If x9 € E andé € (a, b) have € > xo and g(&) > g(x), then every x 
in (a, €) with |g(x) — g(%)| < $(g(€) — g(xo)) lies in E. Hence E is open. 

Let E be the disjoint union of intervals (az, b;). Fix attention on one such 
interval (a, b;). We make critical use of the fact that the point b; is not in E. If 
Xo satisfies a, < x9 < by, we prove that g(xo) < g(bz). Once we do so, we can 
let xp decrease to a, and use continuity to obtain the assertion g (az) < g(bx) of 
the lemma. 

Arguing by contradiction, suppose that g(xo) > g(by). Since xo is in E, 
there exists xj > xo with g(x1) > g(xo). If x1 > bx, then the inequality 
2(x1) > g(x) > g(bx) forces by to be in E. Since bz is not in E,, we conclude 
that x; < by. The set of all x with x; < x < bx and g(x) > g(X1) is closed, 
bounded, and nonempty, and we let x2 be its largest element, so that x2 < by. 

Since g(x2) => g(%1) > g(xo) > g(be), we must have x2 < bx; in fact, 
x2 = by would yield the contradiction g (by) > g (bx). From ay < x9 < x2 < by 
and (a,, b,) C E, we see that x. is in E. Hence there is some € > x with 
g(&) > g(X2). Then the conditions g(€) > g(b,) and by ¢ E together force & to 
be < by. So x2 < & < by with g(€) > g(x), in contradiction to the maximality 
of x2. This contradiction allows us to conclude that g(xo) < g(b;), and the proof 
is complete. 


Theorem 7.2 (Lebesgue). If F is a monotone increasing function on an 
interval, then F is differentiable almost everywhere in this sense: the set where 
F fails to be differentiable is a Lebesgue measurable set of Lebesgue measure 0. 
In addition, if the definition of F'’ is extended so that F’(x) = 0 at every point 
where F is not differentiable, then F’ is Lebesgue measurable. 


REMARK. Recall that any monotone increasing function F can have only 
countably many discontinuities, and these are all given by jumps. In other words, 
F has, at each point x, left and right limits F(x—) and F(x+), and the only 
possible discontinuities occur when one or both of the equalities F(x—) = F(x) 
and F(x) = F(x-+) fail. 


PROOF. The second statement is a consequence of the first. In fact, if E is 
the Lebesgue measurable set of measure 0 where F is nondifferentiable and if B 
is a Borel set of measure 0 containing F, then the sequence of Borel functions 


360 VII. Differentiation of Lebesgue Integrals on the Line 


G,(x) = wn (F(x + 1/n) — F(x)) converges everywhere on B° to a function 
G. If G is extended to the domain of F by defining it to be 0 on B, then G is a 
Borel function that equals F’’ except on a subset of B, and hence F’ is Lebesgue 
measurable. 

Let us come to the conclusion about differentiability. Possibly by taking the 
union of countably many sets, we may assume that the domain of F is a bounded 
interval [a, b]. Fora < x < b, define 


U,(x) = lim sup 7 (F (x +h) — F(x) 
ho 
and L,(x) = lim inf 7, (F (x +h) — F(x)), 


U;(x) = lim sup 7 (F (x +h) — F(x) 
ho 


and L)(x) = liminf ; (F(x +h) — FQ). 
Ato 
We shall prove that 
U,(x) < +00 
and U,(x) < Lit) 


almost everywhere. If the latter inequality is applied to — F (—x), we obtain also 
Ui(x) < L(x) 


almost everywhere. Putting these inequalities together, we have U;(x) < L;(x) < 
U,(x) < Li(x) < U;(x) almost everywhere, and equality must hold throughout, 
almost everywhere. The points where equality holds throughout and also U, (x) < 
+oo are the points where F is differentiable, and hence the two inequalities 
U,(x) < +00 and U,(x) < L;(x) prove the theorem. 

For most of the proof, we shall assume that F is continuous. At the end we 
return and show how to modify the proof to handle discontinuous F’. First we 
consider the inequality U,.(x) < +00. The subset E of (a, b) where this inequality 
fails is, for each positive integer n, contained in the set where U;(x) > n. If 


U BS) — FQ) 
+(x) > n, then ——————— > n for some & > x. That is, g(€) > g(x) for 


x 
the continuous function g(x) = F(x) — nx. In the notation of Lemma 7.1, E is 
covered by a system of disjoint open intervals (az, b,) such that g(ax,) < g(bx) 
for each such interval. Thus n(by — ax) < F (by) — F (ax) for each. Summing 
on k gives n >>; (be — ax) < d2, (F (bn) — F(ax)) < F(b) — F(a). Thus the 
exceptional set E can be covered by a system of open intervals of total measure 
< 1 (F (b)— F(a)). Since n is arbitrary, Proposition 5.39 shows that E is Lebesgue 
measurable of Lebesgue measure 0. 
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Next we prove that U,.(x) < L;(x) almost everywhere on (a, b). If0 < p <q 
are rational numbers, we prove that the set E,,, where 


Lifx) < p<q <U,(x) 


has Lebesgue measure 0. The countable union of such sets is the exceptional set 
in question, and thus we will have proved that the exceptional set has measure 0. 
F() — F(x) 
—— < 
E—x 

hence with pé — F(é) < px — F(x). Define g(z) = pz + F(—z) for z in 
[—b, —a]. If y = —x and 7 = —&, then pn + F(—n) > py + F(-—y) and 
hence g(n) > g(y) with » > y. Applying Lemma 7.1 to g on the interval 
[—b, —a], we obtain a disjoint system of open intervals (—b;, —a;) covering the 
set of y’s where L;(—y) < p and having g(—b;) < g(—a;) in each case. Thus 
— pb; + F(b;) < —pa; + F(q;). In other words, the set of x’s where L;(x) < p 
is covered by a disjoint system of open intervals (a;, b;) such that 


If L;(x) < p, then there exists € € (a,b) with € < x and 


> 


F(O;) — F(@) < pi — 4) («) 
for each such interval. Applying the lemma to g,(x) = F(x) —qx on the interval 
[a;, b;], we obtain a disjoint system of open intervals (a;;, bj;) indexed by j and 
having gp(aij) < gp(bij). Thus («) and 

q(bij — aij) < F (bij) — Fj) () 


hold in each case. Summing () over 7, we obtain 


q ys (bij — aij) < > (F(bi;) — F(aij)) < Fi) — F(a) < p(bi — ai). (7) 
j J 


Summing this inequality over i and dividing by g gives 


m(Epq) < ). (bij — aij) < (p/q)(b - 4). 


ij 


If we repeat this argument with [a;;, b;;] in place of [a, b], we obtain intervals 
(Gijuv, Dijuy) and an inequality 


M(Epq) < SS (bijur = Gijuv) < (p/q) Ss (bi; = aij) < (p/q)°(b oe a). 


i, j,u,v i,j 


Iteration gives m(Ey,) < (p/q)" (b — a) for every n, and therefore m(Ep,) = 0. 
This completes the proof in the case that F is continuous. 
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If F is possibly discontinuous, we modify Lemma 7.1 and the present proof 
as follows. Each function g that arises has right and left limits g(x+) and g(x—) 
at each point x, and we let G(x) be the largest of g(x—), g(x), and g(x+). 
A modified Lemma 7.1 says that the set of x in (a, b) for which there is some 
€ € (a, b) withé > x and g(€) > G(x) is an open set whose component intervals 
(ax, by) have g(ax+) < G(b,x) for each k. Going over the proof of Lemma 7.1 
carefully and changing g to G as necessary, we obtain a proof of the modified 
Lemma 7.1. 

The modifications necessary to the present proof are as follows. In the proof 
that U,(x) < +00 almost everywhere, the set E is to be taken to be the set 
where F is continuous and this inequality fails. The inequality that results from 
applying the modified Lemma 7.1 is n(by — ag) < F(by+) — F(qG+), and this 
inequality can be summed on k without any further change. Similarly in the proof 
that U,(x) < L(x) almost everywhere, the set Eg is to be taken to be the set 
where F is continuous and L;(x) < p <q < U,(x). Inequality («) becomes 
F(b;—) —F(ajt+) < p(b; —a;). When we consider the interval [a;, b;], the value 
of F (b;+) is not relevant, and the value of F'(b;) can be adjusted to equal F (b;—) 
for purposes of understanding F between a; and b;. With that understanding, 
inequality (**) becomes q(bj; — aij) < F(bij+) — F(aqj+), and step (7) is 
replaced by 


Fi 


a>. bij—aij) < > (F bit) -— F@jb) < F@i-)-— F@Ht) < pbi-ai). 
J 


The two inequalities at the ends have come about from () and (+), and the critical 
observation is that the convention F'(b;) = F(b;—) makes the middle inequality 
hold. The rest of the argument proceeds as in the case that F is continuous, and 
then the theorem is completely proved. 


Theorem 7.3 (Fubini’s theorem on the differentiation of series of monotone 
functions). If F = ) > F, is an absolutely convergent sequence of monotone 
increasing functions on [a, b], then F’(x) = SS F’(x) almost everywhere. 


PROOF. Without loss of generality, we may assume that F,,(a) = 0 for all n. 
Then F,,(x) > 0 for all n and x. Possibly by lumping terms, we may assume 
also that F(b) — )77_, Fy(b) < 2~". Since F(x) — }°y_, Fx (x) is a monotone 
increasing function that is 0 for x = a, we have 


0< F(x)— >) i) <2” ) 
k=1 


fora < x <b. The decomposition F(x) = oy, F(x) + (Seng Fe) 
exhibits F as the sum of n + 1 monotone increasing functions, and thus we have 
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ie Fi(x) < F’(x) at all points where all the derivatives exist. In view of 
Theorem 7.2, this inequality holds almost everywhere. Consequently 


O<) Fe) < FQ) (4) 


Rad 


Il 
un 


almost everywhere. Now consider the series 


CO 


Ga) => (Fe) - s F,(@)). 
k=1 


n=1 


Then 


N n lee) 
0<Ga)->)- (Fe) 2 Y Fe(x)) 2 ao, 
k=1 


n=1 n=N+1 


Thus G satisfies the same kind of inequality that F did in (), and we can conclude 
that G satisfies the analog of («*), namely 


0< >) (F'@- DM) <C'@. 
n=1 k=1 


The right side is finite almost everywhere by Theorem 7.2, and thus the individual 
terms F’(x) — )°;_, F(x) of the series tend to 0 almost everywhere. This 
completes the proof. 


From Theorems 7.2 and 7.3, we can derive the first part of Lebesgue’s form of 
the Fundamental Theorem of Calculus. This same result was stated as Corollary 
6.40 and was proved in Chapter VI by using the Hardy—Littlewood Maximal 
Theorem. 


Corollary 7.4 (first part of Lebesgue’s form of the Fundamental Theorem of 
Calculus). If f is integrable on every bounded subset of R!, then i tO) dy is 
differentiable almost everywhere and 


d x 
ie i f@)dt = f(x) almost everywhere. 
x a 


PROOF. It is enough to prove the theorem for functions vanishing off an interval 
[a, b]. Let A be the set of all Borel sets E € [a, b] such that - ibs Ig(t)dt = 
Tg(x) almost everywhere. Then A contains the elementary sets within [a, b], 
and A is closed under complements within [a,b]. If {£,} is an increasing 
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sequence of sets in A with Ey = © and with union E£, let us write Jz = 
yr, Ue, — Iz,_;). This is a series of nonnegative functions. Putting F(x) = 
J Te(t) dt and F,(x) = f* Uz, (t) — Iz,_, (0) dt and applying Corollary 5.27, 
we obtain F(x) = °°, F,(x). Then Theorem 7.3 gives F’(x) = °°", F(x) = 
limy OY, FQ) = limy | le, @) — Te, 4) = limy Tey @) = Te) 
almost everywhere. Thus E£ is in A, and A is closed under increasing count- 
able unions. Since A is closed under complements as well, A is closed under 
decreasing countable intersections. Then the Monotone Class Lemma (Lemma 
5.43) shows that A contains all Borel sets. 

Now consider the set ¥ of all integrable Borel functions f for which the 
almost-everywhere equality 7 Ee f(t) dt = f(x) holds. We have just seen that 
F contains all indicator functions of Borel subsets of [a,b]. By linearity, F 
contains all nonnegative simple functions vanishing off [a, b]. Let f > 0 be an 
integrable function on [a, b], and let {s,,} be an increasing sequence of nonnegative 
simple functions with pointwise limit f. The functions s, are in F. Put so = 0, 
and let F(x) = f* f(t) dt and F, (x) = f* (sn (t)—Sn—1(0)) dt. Since sy > Sp—1, 
each F,, is monotone increasing. Corollary 5.27 shows that F(x) = yet F,(x), 
and Theorem 7.3 then shows that F’(x) = )°°°, F/(x) = limy aaa Fea) = 
lim, eer (Sp (xX) — Sy_1(X)) = limy s,(x) = f (x) almost everywhere. Thus F 
contains all nonnegative integrable Borel functions, and by linearity it contains 
all integrable Borel functions. 
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In this section we address questions about the Lebesgue integral raised by the 
second part of the Fundamental Theorem of Calculus in Theorem 1.32. For 
continuous integrands /f, the result is a kind of uniqueness statement, asserting 
that any function with derivative f differs from fe f (t) dt by aconstant function. 
From a practical point of view, this is the really important part of the theorem for 
calculus, since it provides a technique for evaluating definite integrals: find any 
function whose derivative is the given function, evaluate it at the endpoints, and 
subtract the results. With the Lebesgue integral and equality of derivatives only 
almost everywhere, the uniqueness result is not as sharp. The practical aspect of 
a uniqueness theorem is largely lost, and the resulting theory ends up having to 
be appreciated only as an end in itself. We begin at the following point. 


Proposition 7.5. Every monotone increasing function on R! is uniquely the 
sum of an indefinite integral Gx) = i f(t) dt, where f > 0 is integrable 
on every bounded interval, and a monotone increasing function H such that 
H'(x) = 0 almost everywhere. 
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PRooF. Let F be a given monotone increasing function on R!. If F = 
G+ H with G as in the statement of the proposition and with H’(x) = 0 
almost everywhere, Corollary 7.4 shows that we must have f = F’. This proves 
uniqueness. 

For existence we take f = F’. Regard h as a positive number tending to 0 
through some sequence, so that h~!(F(t +h) — F(t)) tends to f’(t) for almost 
every t. Ifa < b, then 


se oC ae CO a a ca Lf? 
d, A aaa fe roar—; f F(t)dt 


1 b+h 1 ath 
=— F(t) dt — — F(t) dt. 
if (t) al (t) 


The right side tends to F(b) — F(a) if a and b are points of continuity of F. 
By Fatou’s Lemma (Theorem 5.29), i f(t)dt < F(b) — F(a) if a and b are 
points of continuity of F. The points of continuity of F are dense, and thus for 
general a and b, we can find sequences of points of continuity decreasing to a 
and increasing to b. Passing to the limit, we obtain 


b 
i f(t)dt < F(b—-) — F(at+) < F(b) -— F@ (*) 


for all a and b. Hence f is integrable. With G(x) as in the statement of the 
proposition, («) gives G(b)—G(a) < F(b)—F (a). Equivalently, F(a)—G(a) < 
F(b) — G(b). Thus the function H (x) = F(x) — G(x) is monotone increasing 
with F = G+ H. Since F and G have derivative f almost everywhere, H has 
derivative 0 almost everywhere. 


Thus we wantto identify all monotone increasing functions with derivative zero 
almost everywhere. The first step is to see that the question of discontinuities of 
a monotone function can be completely eliminated from the problem. 


Proposition 7.6. Let c be a real number. If {x,} is a sequence in [a, b] and if 
{c,} and {d,,} are sequences of positive real numbers with )° c, finite and }° d, 
finite, then the function 


Fa@)=c+ > ent D> dn 


n with n with 
Xne% Xyek 


is a monotone increasing function on [a, b] with F’(x) = 0 almost everywhere. 


366 VU. Differentiation of Lebesgue Integrals on the Line 


PROOF. The function F is certainly monotone increasing. It is the convergent 
sum of the constant function c and monotone increasing functions of the form 


0 for x < Xp, 
F(x) = 4 Ch for x = Xn, 
Ch t dn for x > Xp, 


and the function F,, has derivative 0 except at the point x,,. Thus the proposition 
follows immediately from Theorem 7.3. 


A monotone increasing function on the line whose restriction to every closed 
bounded interval is of the form in Proposition 7.6 is called a saltus function; the 
name comes from the Latin word for “jump.” Since R! is the countable union 
of closed bounded intervals, it follows from Proposition 7.6 that every saltus 
function has derivative 0 almost everywhere. 


Proposition 7.7. Any monotone increasing function F on R! is uniquely the 
sum F = G+ S of a continuous monotone increasing function G with G(O) = 0 
and a saltus function S. 


PROOF. For existence, it is enough to obtain the decomposition without 
insisting on the normalization G(O) = 0, since the sum of a saltus function 
and a constant is a saltus function. Let xo be a point of continuity of F, and 
enumerate the points of discontinuity of F as x,,n > 1. For eachn > 1, define 
Cn = F(X) — F(xy,—) and dy, = F(xn+) — F (xy). Let S be the saltus function 

S(x) = | Does Cn + Sass dn if x ee 0, 
= baer Ch — pees <x<xq dy ifx <0, 
and putG = F—S. Then G is continuous everywhere. To see that G is monotone 
increasing, leta < bbe points of continuity of F and S. We start from the equality 
S(Xn+) — S(Qn—) = F(xXn+) — F(xn—) and sum for x, with a < x, < b to 
obtain 


S(b)- S@) = D> (S@nt) - SGn—)) 


a<Xn<b 
= 0) (FGnt)— F@n-)) 
a<Xn<b 


< F(b) — F(a). 


Hence F(a) — S(a) < F(b) — S(b), and we conclude that G(a) < G(b) at 
all points of continuity a < b of F and S. These points are dense, and G 
is continuous everywhere. Hence G(a) < G(b) whenever a < b, and G is 
monotone increasing. This proves existence. Uniqueness follows from the fact 
that S(b—) — S(a+) = Sees (F (xn+) — F(x,—)) whenever a < b, and the 
proof is complete. 
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Consequently we need to understand the continuous monotone increasing 
functions F on the line with F’(x) = 0 almost everywhere. The Cantor function 
for the standard Cantor set, constructed as in Section VI.8, is an example. For 
such a function, F — F (0) satisfies the defining properties of the distribution 
function of a Stieltjes measure on R!. The continuity of F is equivalent to the 
fact that jz contains no point masses. The following property isolates the meaning 
of having derivative zero almost everywhere. 


Proposition 7.8. Suppose that jz is a Stieltjes measure with no point masses. 
If the distribution function F of jz has F’(x) = 0 at every point of a Borel set E, 
then u(E) = 0. 


REMARK. The proof will use the Rising Sun Lemma (Lemma 7.1). Problem 3 
at the end of the chapter asks for an alternative proof by means of Wiener’s 
Covering Lemma (Lemma 6.41). A proof using Wiener’s Covering Lemma does 
not make use of the continuity of F’, and therefore it is not necessary to assume 
in the proposition that jz has no point masses. 


PROOF. We may confine our attention to an interval [a, b], taking E to bea 
subset of [a, b]. Since jz has no point masses, we may discard a and b from E. 
Fix a positive integer n. For every point x in E, we have F’(x) < 1. Therefore 
to each such x, we can associate some € > x with é in (a, Db) such that 


oS aa 


—E—x n 


This inequality says that lé — F(é) > 1x — F(x), hence that the continuous 
function g with g(x) = ix — F(x) has g(&) > g(x). The Rising Sun Lemma 
(Lemma 7.1) applies and shows that E is covered by countably many disjoint 
open intervals (a;, by) with g(ax) < g (by). Thus + a, — F(ag) < + by — F (by) 
for each k. Adding, we obtain 


w(E) <)> W((ax, be) = D5 F (be) — Flax)) S$ 4 YS be - ax) < 4b). 
k k k 


Since n is arbitrary, w(E) = 0. 


Again consider a continuous monotone function F with derivative zero almost 
everywhere. The function F — F (0) is the distribution function of some Stieltjes 
measure jz with no point masses, and Proposition 7.8 shows that there is a Borel 
set E such that w(£) = 0 and m(E‘) = 0, where m is Lebesgue measure. In 
other words, jz is concentrated completely on the set E° of Lebesgue measure 0. 
A Stieltjes measure j for which there is a Borel set F with w(F°) = 0 and 
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m(F) = 0 is called a singular Stieltjes measure or a “Stieltjes measure singular 
with respect to Lebesgue measure.” If also it contains no point masses, it is said to 
be continuous singular. The Stieltjes measure associated to the Cantor function 
for the standard Cantor set is an example. We can summarize matters either in 
terms of decompositions of monotone functions or in terms of decompositions of 
Stieltjes measures. The result in the case of monotone functions is a first answer 
to the question of uniqueness in the Fundamental Theorem of Calculus for the 
Lebesgue integral; the result in the case of Stieltjes measures gives the Lebesgue 
decomposition of Stieltjes measures. 


Theorem 7.9. Every monotone increasing function F on R! decomposes 
uniquely as the sum F = G+ H + S, where G is the indefinite integral G(x) = 
aps f(t) dt of a function f > 0 integrable on every bounded interval, H is the 
distribution function of a continuous singular measure, and S is a saltus function. 
The function f is the derivative of F. 


PROOF. Proposition 7.7 allows us to write F = P + S uniquely, where S 
is a saltus function and P is continuous and monotone increasing with P(O) = 
0. Proposition 7.5 says that P = G+ H uniquely, where G is an indefinite 
integral G(x) = HPs f(t) dt and H is monotone increasing with H’(x) = 0 
almost everywhere. The function f is the derivative of F. The function H has 
H (O) = Oand is continuous because P and G have these properties, and therefore 
H is the distribution function of a Stieltjes measure jz containing no point masses. 
Since H'(x) = 0 almost everywhere, Proposition 7.8 shows that jz is singular. 


Corollary 7.10 (Lebesgue decomposition). Every Stieltjes measure ;1 decom- 
poses uniquely as the sum wp = fdx + us + wa, where f > O is a function 
integrable on every bounded interval, 4, is a continuous singular measure, and 
tg is a countable sum of point masses such that the sum of the weights on any 
bounded interval is finite. 


PROOF. This follows by applying Theorem 7.9 to the distribution function 
of pL. 


The final question that we address in this section is how to recognize the 
particular monotone function G(x) = His f(t) dt from among all the monotone 
functions F = G + H + S described in Theorem 7.9. 


Proposition 7.11. With m denoting Lebesgue measure, the following condi- 
tions on a Stieltjes measure jz, are equivalent: 


(a) [a is of the form 4g = f dx for some function f > 0 that is integrable 
on every bounded interval, 
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(b) for each bounded interval [a, b] and number € > 0, there exists a number 
56 > 0 such that w,(E) < € whenever E is a Borel subset of [a, b] with 
m(E) <6, 

(c) ftq(E) = 0 whenever E is a Borel subset of R! with m(E) = 0. 


REMARK. A Stieltjes measure ji satisfying the equivalent conditions in this 
proposition is said to be absolutely continuous or “absolutely continuous with 
respect to Lebesgue measure.” From any of these defining conditions, we see 
right away that an absolutely continuous measure contains no point masses. 


PROOF. Corollary 5.24 shows immediately that (a) implies (b). To see that (b) 
implies (c), let E be a Borel set E in R! with m(E) = 0. Applying (b) to EN[a, b] 
gives Uq(E 2 [a, b]) < € for every positive €, and hence g(E NM [a, b]) = 0. 
Since [a, b] is arbitrary and jz, is completely additive, wg(E) = 0. 

To see that (c) implies (a), we appeal to Corollary 7.10 to decompose jig 
according to the Lebesgue decomposition as 


Ma = fdx+ustwua, (*) 


where 4, is continuous singular and jg is discrete. The measures pz, and pg 
have the property that there is a Borel set E with m(E£) = 0 such that w;.(E°) = 
La(E°) = 0. Condition (c) shows that wq(E) = 0. Evaluating (*) at E, we 
obtain 0 = ug(E) = 0+ ws(E) + wa(E). Therefore ws(E) = ug(E) = 0. 
Since Us(E°) = ug(E°) = 0 also, we must have ws; = ga = O, and then (+) 
shows that 4, = f dx. 


In Chapter IX the implication (c) implies (a) will be generalized to a result in 
abstract measure theory known as the Radon—Nikodym Theorem. Meanwhile, it 
is conditions (b) and (c) that we can translate into a condition on the corresponding 
distribution function, and then we shall have our second and final answer to the 
question of uniqueness in the Fundamental Theorem of Calculus for the Lebesgue 
integral. A monotone increasing function F on the line is said to be absolutely 
continuous if for each bounded interval [a, b] and number € > 0, there exists a 
6 > 0 such that on any countable disjoint union ), (ax, b,) of intervals within 
[a, b] having total length < 4, the variation )°, (F (by) — F (ax)) of F on that 
union of intervals is < €. 


Proposition 7.12. A Stieltjes measure is absolutely continuous if and only if 
its distribution function is absolutely continuous. 


PROOF. Let yz be a Stieltjes measure with distribution function F'. Suppose 
that jz is absolutely continuous. Fix an interval [a, b], let € > 0 be given, and 
choose 6 > 0 by (b) in Proposition 7.11 such that m(E) < 6 implies n(E) < e. 
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If the set A = L), (ax, by) is a countable disjoint union of intervals within [a, b] 
having total length < 6, then m(A) < 6, and hence w(A) < ¢€. Therefore 
>, (F (be) — F (ag) = D2, MC(be — ae) = (A) < €, and we conclude that F 
is absolutely continuous. 

Conversely suppose that F is absolutely continuous, and suppose that F is a 
Borel set with m(E) = 0. Fix an interval [a, b], and let € > 0 be given. By 
absolute continuity of F’, there exists a 6 > O such that on any countable disjoint 
union ), (ax, bx) of intervals within [a, b] having total length < 6, the variation 
>, (F (be) — F(a)) of F on that union of intervals is < €. With 6 defined in 
this way, we can find a countable disjoint union of intervals J, (ax, b;) covering 
EO [a, b] and having total length < 6. Then w(E NM [a, b]) < i( U;, (ax, bx)) = 
do, M(ak, bb) = Yo, (CF (bi) — F (ax) < €. Since € is arbitrary, w(EN[a, b]) = 
0. Since [a, b] is arbitrary, 4(E) = 0. Therefore ju satisfies (c) in Proposition 
7.11 and is absolutely continuous. 


Corollary 7.13 (second part of Lebesgue’s form of the Fundamental Theorem 
of Calculus). Let F be a monotone increasing function on R!, and let f be its 
almost-everywhere derivative. Then res f(t) dt = F(b) — F(a) whenevera <b 
if and only if F is absolutely continuous. 


PROOF. By Theorem 7.9 we can write F(x) = i; f(t)dt+ H(x)+ SQ), 
where # is the distribution function of a continuous singular measure and S is a 
saltus function. For a < b, we then have 


b 
F(b) — F(a) =} f(t) dt + (H(b) — H(a)) + (S(6) — S@)). 


If F(b) — F(a) = hs f(t) dt whenever a < b, then the monotonicity of H and S$ 
forces H and S to be constant functions, say with H(0)+ S(O) = c. Substituting, 
we see that F(x) = [> f(t) dt+c forallx. The function ff (¢) dt is absolutely 
continuous by Proposition 7.12, and the additive constant c does not hurt matters. 
Thus F is absolutely continuous. 

Conversely if F is absolutely continuous, then it is continuous, and its mono- 
tonicity forces F — F (0) to be a distribution function of some Stieltjes measure 
jt. Proposition 7.12 shows that the measure yz is absolutely continuous, and 
Proposition 7.11 shows that jz is of the form jz = g dx. Therefore F(x) — F(O) = 
Jo g(t) dt. By Corollary 74, g = F’ = f. Hence F(b) — F(a) = f f(t)dt 
whenever a < b. 


3. Problems 


1. In the Rising Sun Lemma (Lemma 7.1), show that g(ax) = g(bx) if ax # a. 
Give an example of a continuous g for which one of the intervals (a;, bg) has 
ag =a and g(a) < g(bx). 
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Let m be Lebesgue measure. Does there exist a Lebesgue measurable set E such 
that m(E NJ) = sm(1 ) for every bounded interval 7? Why or why not? 


Prove Proposition 7.8 using Wiener’s Covering Lemma (Lemma 6.41) instead 
of the Rising Sun Lemma (Lemma 7.1). 


Find all continuous monotone increasing functions on R! with derivative 0 at all 
but countably many points. 


Cantor sets within [0, 1] were introduced in Section II.9. Each is associated to 
a sequence {r;,},>1 of numbers with 0 <r, < 1, the standard Cantor set being 
obtained when r, = 1/3 for every n. Section VI.8 showed how to associate a 
distribution function to the standard Cantor set, and in similar fashion one can 
associate a distribution function to any Cantor set. Let C be a Cantor set, let 
F be the associated distribution function, and let jz be the associated Stieltjes 
measure. The Lebesgue measure of C is the number P = hy (1 —r,). Prove 
that 

(a) wis singular if P = 0, 

(b) jis absolutely continuous if P > 0, being of the form PIe (x) dx. 


Problems 6—7 concern the Lebesgue set of an integrable function f on an interval 
[a, b]. This is the set where £ i. | f(t) — f (x)| dt exists and equals 0. Many almost- 
everywhere convergence results involving f are valid at every point of the Lebesgue 


Set. 


Such results may be regarded as relatively straightforward consequences of 


Corollary 7.4. Conversely an almost-everywhere convergence theorem that fails to 
hold at some point of the Lebesgue set might well be expected to involve some new 
idea. 


6. 


For f integrable on [a,b], prove that almost every point of (a,b) is in the 
Lebesgue set of f by showing that the Lebesgue set of f is the same as the set 
where 4 hs | f(t) —r|dt ~|f() —r| for some rational r. 


The Fejér kernel, which was defined in Section I.10 and studied further in 
Section VI.7, is the periodic function defined for t in [—z, 7] by Ky(t) = 
1 1l—cos(N+1)t 


Wel —1-cos7 - Let f be integrable on [—, zr], regard f as periodic, and let x 


be in the Lebesgue set of f. Prove that limy (Ky * f)(x) = f(x) by following 

these steps: 

(a) Check that the estimates Ky(t) < N+ 1 and Ky(t) < c/(Nt’) are valid 
for all N and for |t| < z. 

(b) Check that the problem is to show that Siiee Kn(@|f@ —t) — f@)|dt 
tends to 0 as N tends to infinity. 

(c) Break the integral in (b) into pieces where |t| < 1/N, where 2‘-!/N < 
|t| < 2*/N for 1 < k < log,(N*/4), and where 1/N!/* < |t| < z. Using 
the better of the bounds in (a) in each piece, prove the statement that (b) says 
needs to be shown. 


372 VU. Differentiation of Lebesgue Integrals on the Line 


Problems 8—12 concern singular Stieltjes measures, which for notational convenience 
we assume are continuous singular. In all these problems it is assumed that jz is a 
continuous singular measure and m is Lebesgue measure. Among other things these 
problems prove that the indefinite integral of jz has derivative 0 almost everywhere 
with respect to Lebesgue measure, i.e., f. ie. du(t) = 0 ae. [dx], with the tools of 
Chapter VI and without Theorem 7.2. 


8. Ife > 0 is given, prove by considering m + y that there exists an open set U in 
R! such that w(U) < € and m(U°) =0. 

9. If U is an open subset of R! and v is a Stieltjes measure with v(U) = 0, prove 
that lim; 9 (2h)~!v((x —h,x +h)) = 0 forall x in U. 


10. Let v be any finite Stieltjes measure, and define 
v*(x) = sup (2h)! v(x —h, x +h)). 
h>0 


Prove for each € > 0 that m{x | v*(x) > é} < 5v(R!)/é by imitating the proof 
of Theorem 6.38. 


11. For the singular measure jz, assume that j1(R!) is finite. Let € > 0 be given, 
and choose an open set U as in Problem 8. Define Stieltjes measures jz; and (12 
by 1(A) = UW(ANU) and 2(A) = “(A — U). Use Problem 9 to prove that 
limp 0 (2h)! 2 ((x —h,x+h)) = 0a. [dx], and use Problem 10 to prove for 
all € > O that 


m{x | limsup (2h)! p(w —h, x +h)) > &} < Se/é. 
hJO 
12. Deduce from Problem 11 that limp jo (2h)-!w((x —h, x +h)) = Oae. [dx]. By 


reviewing the proof of Corollary 6.40, show how the argument in Problems 8-11 
can be adjusted to yield the better conclusion that 4 tn du(t) = Oae. [dx]. 


CHAPTER VIII 


Fourier Transform in Euclidean Space 


Abstract. This chapter develops some of the theory of the R" Fourier transform as an operator that 
carries certain spaces of complex-valued functions on RY to other spaces of such functions. 

Sections 1-3 give the indispensable parts of the theory, beginning in Section 1 with the defi- 
nition, the fact that integrable functions are mapped to bounded continuous functions, and various 
transformation rules. In Section 2 the main results concern L!, chiefly the vanishing of the Fourier 
transforms of integrable functions at infinity, the fact that the Fourier transform is one-one, and 
the all-important Fourier inversion formula. The third section builds on these results to establish a 
theory for L?. The Fourier transform carries functions in L! N L? to functions in L”, preserving the 
L? norm; this is the Plancherel formula. The Fourier transform therefore extends by continuity to 
all of L?, and the Riesz—Fischer Theorem says that this extended mapping is onto L”. These results 
allow one to construct bounded linear operators on L? commuting with translations by multiplying 
by L™ functions on the Fourier transform side and then using Fourier inversion; a converse theorem 
is proved in the next section. 

Section 4 discusses the Fourier transform on the Schwartz space, the subspace of L! consisting of 
smooth functions with the property that the product of any iterated partial derivative of the function 
with any polynomial is bounded. The Fourier transform carries the Schwartz space in one-one 
fashion onto itself, and this fact leads to the proof of the converse theorem mentioned above. 

Section 5 applies the Schwartz space in R! to obtain the Poisson Summation Formula, which 
relates Fourier series and the Fourier transform. A particular instance of this formula allows one to 
prove the functional equation of the Riemann zeta function. 

Section 6 develops the Poisson integral formula, which transforms functions on R’ into harmonic 
functions on a half space in RN+!. A function on R™ can be recovered as boundary values of its 
Poisson integral in various ways. 

Section 7 specializes the theory of the previous section to R!, where one can associate a “con- 
jugate” harmonic function to any harmonic function in the upper half plane. There is an associated 
conjugate Poisson kernel that maps a boundary function to a harmonic function conjugate to the 
Poisson integral. The boundary values of the harmonic function and its conjugate are related by the 
Hilbert transform, which implements a “90° phase shift” on functions. The Hilbert transform is a 
bounded linear operator on L? and is of weak type (1, 1). 


1. Elementary Properties 


Although the Fourier transform in the one-variable case dates from the early 
nineteenth century, it was not until the introduction of the Lebesgue integral 
early in the twentieth century that the theory could advance very far. Fourier 
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series in one variable have a standard physical interpretation as representing 
a resolution into component frequencies of a periodic signal that is given as 
a function of time. In the presence of the Riesz—Fischer Theorem, they are 
especially handy at analyzing time-independent operators on signals, such as 
those given by filters. An operator of this kind takes a function f with Fourier 
series f(x) ~ )o._ cne’”™* into the expression }°>-_,, m™ncne'"*, where the 
constants m, depend only on the filter. If the original function f is in L and if 
the constants m, are bounded, the Riesz—Fischer Theorem allows one to interpret 
the new series as the Fourier series of anew L” function T (f), and thus the effect 
of the filter is to carry f to T(f). 

If one imagines that the period is allowed to increase without limit, one can 
hope to obtain convergence of some sort to a transform that handles aperiodic 
signals, and this was once a common attitude about how to view the Fourier 
transform. In the twentieth century the Fourier transform began to be developed 
as an object in its own right, and soon the theory was extended from one variable 
to several variables. 

The Fourier transform in Euclidean space R% is a mapping of suitable kinds 
of functions on R® to other functions on R“. The functions will in all cases 
now be assumed to be complex valued. The underlying RY is usually regarded 
as space, rather than time, and the Fourier transform is of great importance in 
studying operators that commute with translations, i.e., spatially homogeneous 
operators. One example of such an operator is a linear partial differential operator 
with constant coefficients, and another is convolution with a fixed function. In 
the latter case if F denotes the Fourier transform and h is a fixed function, the 
relevant formula is F(h « f) = F(h)F(f), the product on the right side being 
the pointwise product of two functions. Thus convolution can be understood in 
terms of the simpler operation of pointwise multiplication if we understand what 
F does and we understand how to invert F. 

In the actual definition of the Fourier transform, factors of 27 invariably pop 
up here and there, and there is no universally accepted place to put these factors. 
This ambiguity is not unlike the distinction between radians and cycles in con- 
nection with frequencies in physics; again the distinction is a factor of 27. The 
definition that we shall use occurs quite commonly these days, namely 


Oma [ Foye de, 


with x - y referring to the dot product and with the 27 in the exponent. The formula 
for F—! will turn out to be similar looking, except that the minus sign is changed 
to plus in the exponent. Some authors drop the 27 from the exponent, and then 
a factor of (27)~ is needed in the inversion formula. Other authors who drop 
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the 27 from the exponent also include a factor of (27)~* in front of the integral; 
then the inversion formula requires no such factor. Still other authors who drop 
the 27 from the exponent insert a factor of (277)~%/? in the formula for both F 
and its inverse. In all cases, there is a certain utility in adjusting the definition 
of convolution by an appropriate power of 27 so that the Fourier transform of 
a convolution is indeed the pointwise product of the Fourier transforms. The 
relationships among these alternative formulas are examined in Problem | at the 
end of the chapter. 

At any rate, in this book we take the boxed formula above as the definition of 
the Fourier transform of a function f in L'(R% , dx). Convolution was defined in 
Section VI.2. Although there are many elementary functions for which one can 
compute the Fourier transform explicitly, there are precious few for which one 
can make the pair of calculations that compute the Fourier transform and verify 
the inversion formula. One example is e-*’ which will be examined in the 
next section. 

Recall from Section VI.1 that the translate t,, f is defined by t,, f(x) = 


f (&% — x0). 


Proposition 8.1. The Fourier transform on L!(R%) has these properties: 


(a) f in L! implies that fis bounded and uniformly continuous with || fil sig 


fll, 


(b) f in L! implies that the translate t,, f and the product f(x)e~77"*" have 
(Taf) (y) = e709 FY) and FF He ”)Y) = (Tp FIO), 
(c) f and g in L! implies f * g = f 8, 
(d) f in L! implies f* =f , where T°) = f(x), = 
(ec) (multiplication formula) f and g in L! implies yy fOdx = fon f gdx, 
(f) f in L! and 2zix;f in L' implies that 7 exists in the ordinary sense 
everywhere and satisfies = = F(—2nix;f), 
; 
(g) f in L! and on existing in the L! sense, i.e., limp.oh!(t_he, f — f) 
existing in L', implies F(#£) (y) = 2miy; f (y). This formula holds also 
j 
when f is in L'C!, the ordinary 2“ is in L!,and f vanishes at infinity. 
J 


PROOF. All the integrals will be over IR , and we drop RN from the notation. 
For (a), we have | f(y)| < f |f(@)| dx = || fll; ,and hence || f lljup < Il fll. Also, 


IFO) — FOI < fF @) ler! — eA] dx 
= fl f(x) e271“) — I dx. 
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On the right side the second factor of the integrand is bounded by 2 and tends 
to 0 for each x as y; — y2 tends to 0. Thus the right side tends to 0 by dominated 
convergence at a rate depending only on y; — yo. 

For (b), (tif) (9) = f fOr — x0)e 2") dx = f fe POH dx = 
e727 ix0-Y Fy) and 


Ff (x)e77!*)(y) — f f (x)e27i*- 90 e—2nix-y ax 
— i ferro) dx = (ty) f 0). 


For (c), we use Fubini’s Theorem. The standard technique for verifying the 
theorem’s applicability was mentioned near the end of Section V.7. Let us see 
the technique in context this once. The procedure is to write out the computation, 
blindly making the interchange, and then to check the validity of the interchange 
by imagining that absolute value signs have been put in place. What needs to 
be verified is that the double or iterated integrals with the absolute value signs in 
place are finite. The computation here is 


fxg) =Sf f@ —Dg he? dtdx = [f fx —Ng(e2"? dx dt 
=f f@gMe 209? dx dt = f(y)B0). 


The steps with absolute value signs in place around the integrands are 


[PIF @— Ng Me? | dedx = ff |f@ —Ng@e™| dx dt 
= ff lf@)g@Qe77e| dx dt. 


The first interchange is valid, but the first and second integrals are not so clearly 
finite. What is clear, because f and g are integrable, is that we have finiteness 
for the third integral, and the second and third integrals are equal by a translation 
in the inner integration. Thus the computation of f¥ ey) is justified. 
For (d), we have f*(y) = f f@xe 2? dx = f fe) dx = f(y). 
For (e), we use Fubini’s Theorem, justifying the details in the same way as in 
(c). We obtain 


f fGdx = ff fee 77* dy dx 
= ff fee?" dx dy = f f ydy, 


and the interchange is valid because f and g have been assumed integrable. 
For (f), we apply (b) and obtain 


h-"(f(y + hej) — f(y) = Ff) hoe 4* — 1). 
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Application of the Mean Value Theorem to the real and imaginary parts of 
h-!(e7?7!hei* _ 1) shows for |h| < 1 that 
|Re(h'(e 7" * — 1))| = |h-" (1 — cos 2rhx;)| < 27x; 
and |Im(h7!(e?""* — 1))| = [h7! sin 2xhx;| < 27|xj\, 
hence that Jn" (e™hei* — 1)| < 4ar|x;]. 


Since x; f(x) is assumed integrable, we have dominated convergence in the 
computation of the limit of F( f(x) h7'(e~*7'"*i* — 1))(y) as A tends to 0, 


0 
and we get F( — 27ix;f)(y) = SL wy) 
yj 
For the first part of (g), the assumptions and (a) give 


|FO7 he, f — PO) — FGFE)O)| ¢ Ae f — A - FE, > 0. 


The left side equals Ifoyn7! (ermine; F — 1) = FSE)(y)| by (b), and this tends 
to | f(y)2xiy; — F(FL)(y)|. Hence F(5L)(y) = 2riy FO). 
For the second part of (g), let x denote the tuple of the N — 1 variables other 
than x;. Then integration by parts in the variable x; gives 
F(5E)Q) =e ee bee ae dx; dx 
= fenen ity ae dx; dx; 
= Java lim, [fae PY, dx 
= feya ling | of tay ed ax 
= 0 + 2miy; f(y), 


as asserted. 


2. Fourier Transform on L!, Inversion Formula 
’ 


The main theorem of this section is the Fourier inversion formula for L!(R%). 
The Fourier transform for R! is the analog for the line of the mapping that carries a 
function f on the circle to its doubly infinite sequence {c;,} of Fourier coefficients. 
The inversion problem for the circle amounts to recovering f from the c,’s. We 
know that the procedure is to form the partial sums s,(x) = )-7__,, cee“ and to 
look for a sense in which {s,} converges to f. There is no problem for the case 
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that f is itself a trigonometric polynomial; then s, will be equal to f for large 
enough n, and no passage to the limit is necessary. 

The situation with the Fourier transform is different. There is no readily 
available nonzero integrable function on the line analogous to an exponential on 
the circle for which we know an inversion formula with all constants in place. In 
order to obtain such an inversion formula for the Fourier transform on L!, it is 
necessary to be able to invert the Fourier transform of some particular nonzero 
function explicitly. This step is carried out in Proposition 8.2 below, and then 
we can address the inversion problem of L!(R%) in general. The analog for the 
circle of what we shall prove for the line is a rather modest result: It would say 
that if )° |c;,| is finite, then the sequence of partial sums converges uniformly 
to a function that equals f almost everywhere. The uniform convergence is a 
relatively trivial conclusion, being an immediate consequence of the Weierstrass 
M test; but the conclusion that we recover f lies deeper and incorporates a version 
of the uniqueness theorem. 


Proposition 8.2. Fle") =e, 


REMARKS. Readers who know about the Cauchy Integral Theorem from 
complex-variable theory or else Green’s Theorem in the theory of line integrals 
will recognize that the calculation below amounts to an application of one or the 
other of these theorems to the function e~* over a long thin geometric rectangle 
next to the x axis in C. However, the present application of either of these 
theorems is so simple that we can without difficulty substitute a proof of one of 
these theorems in the special case of interest, and hence neither of these other 
theorems needs to be assumed. As the proof below will show, matters come down 
to the Fundamental Theorem of Calculus in its traditional form (Theorem 1.32). 


PROOF. The question is whether 


Heeb x2. Om ilxyyyte tx y 2 a (y24..-+y? 
i eT b ts) p= 2m NF FANIN) dys os dy 2 eoPORt DR), 
RY 


and the integral on the left is the product of N integrals in one variable. Thus the 
question is whether 


oe) 
—n(x24+2ixy 2? _ay? 
| e m(x“+2ixy) dx =e. 


(ee) 


We start by observing that 


” —n(x?+2ixy) —my? - —m(x+iy)? 
e dx =e e dx. (*) 
—0oo —0oo 


Write 


m(x*-y 


e Paty — u(x, y)+iv@,y)= e-7"-Y) cog 2axy —ie~ ») sin 2Qnxy. 
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Direct calculation gives! 

du dv du dv 

— = — and —=-—. (2) 
ox dy dy Ox 


Regard n as positive and large. Then 


[",,u(s,0)ds — f" u(s, y)ds 


=— en ie se(s, t)dt ds by Theorem 1.32 
=+/", fo 26,0) dtds by (**) 
=a Eg ae av (g, t) ds dt by Fubini’s Theorem 


= fp v(n,t)dt — fp v(—n,t)dt by Theorem 1.32. 


With y fixed we let n tend to infinity. Then v(7, t) and v(—n, f) tend to0 uniformly 
for t between 0 and y by inspection of v, and hence the right side of our expression 
tends to 0. Thus 


[2 us, O)ds = f° u(s, y)ds, 
which says that 


2. oe Sas: 
Re | egy Re [ BD (+) 
— —00 


(ee) 


Similarly we calculate 
f°, vs, 0)ds— f" vs, y)ds=— f", fy a (s, t) dt ds 
=— is ik au (g, t)dtds by («) 
=— fp f", 46,0) ds dt 
=— de. u(n,t)dt + ce u(—n,t) dt. 
Again we can see that the right side tends to 0, and thus 
[20 v(s, 0) ds = f° v(s, yds, 
which says that 


29 2 a aye 
Im if e™* dx =Im i gO die (+t) 
—0oo —oo 
Taking () into account and combining (+) and (+7), we obtain 
oe) [o.@) oe) 
| eB HEY) dy — ary e Pati dy — av f gue dx, 
—C —0o —Co 


and the proposition follows from the formula a er dx =1 given in Propo- 
sition 6.33. 


'The equations (#*) are called the Cauchy—Riemamn equations. They occur again in Section 7. 
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We shall use dilations to create an approximate identity out of e-7 hl in the 
style of Section VI.2. Put g(x) = e~7l*’ and define g(x) = e Nep(e'x) for 
€ > 0. Whenever ¢ in an integrable function and ¢, is formed in this way, we 
have 


@:(y) => Spx wslae oe? dx = eg N de ete tee te dx 
=_ Jpn p(xje—27ix-ey dx = Dey), 


the next-to-last equality following from the change of variables ¢~!x + x. 


For the particular function g(x) = ell’, this calculation shows that @:(y) = 
ete IP In particular, @; is > 0 and vanishes at oo for each fixed ¢ > 0. Ase 
decreases to 0, @; increases pointwise to the constant function 1. The constant 
c in Theorem 6.20 for this g isc = fy g(x) dx = 1 by Proposition 6.33. That 
theorem gives various convergence results for g, * f , one of which is that g, « f 
converges to f in L' if f isin L!. 


Theorem 8.3 (Riemann—Lebesgue Lemma). If f is in L'(R%), then the 
continuous function f vanishes at infinity. 


PROOF. The continuity of f is by Proposition 8.la. Put g(x) = e-7 and 
form g,. Then parts (c) and (a) of Proposition 8.1 give 


IGF —F lleup = lle *F — F lleup < lve * F - Fl 


and Theorem 6.20 shows that the right side tends to 0 as ¢ decreases to 0. Hence 
enrely Fly) tends to fy) uniformly in y. Since Fe is bounded (Proposition 
8.1a), e778 Il fy) vanishes at infinity. The uniform limit of functions vanishing 
at infinity vanishes at infinity, and the theorem follows. 


Theorem 8.4 (Fourier inversion formula). If f is in L'(R%) and f is in 
L'(RY), then f can be redefined on a set of measure 0 so as to be continuous. 
After this adjustment, 


fO.= i: fone") dy. 


PROOF. By way of preliminaries, recall from Proposition 8.1e that the multipli- 
cation formula gives { f gdx = f f gdx whenever f and g are both integrable. 
With ¢ fixed for the moment, let us apply this formula with g(x) = et lt!” The 
remarks before Theorem 8.3 about how the Fourier transform interacts with dila- 
tions show that g(y) = eNe-te “ly” In other words, if we take Q(x) = ere 
then 


/ f(x)ge(x) dx = / Fe tev" ay, (x) 
RY RY 
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To prove the theorem, consider first the special case that f is bounded and 
continuous. If we let ¢ decrease to 0 in (x), the left side tends to f (0) by Theorem 
6.20c, and the right side tends to Spx f(y) dy by dominated convergence since 
fis assumed integrable. Thus f(0) = Jpn fy) dy. Applying this conclusion 
to the translate t_, f and using Proposition 8.1b, we obtain 


f(x) = (tx f)0) = ft_xf) (dy = f fe” dy, 


as required. 

Without the special assumption on /, we adjust the above argument a little. 
Using the equality g.(—y) = @-(y), we apply () to the translate t_, f of f to 
get 


21? 


f Fert re7eWP dy = f f+ y)ge(y) dy 


= f fx —yge(y) dy = (Ge * f)(X). 


As € decreases to 0, the left side tends pointwise to [ F (yerrixy dy by dominated 
convergence, and the result is a continuous function of x, by a version of Propo- 
sition 8.1a. The right side tends to f in L' by Theorem 6.20, and hence Theorem 
5.59 shows that a subsequence of g, « f tends to f almost everywhere. Thus 
f(x) = fan F(ye2*9 dy almost everywhere, with the right side continuous. 


Corollary 8.5. The Fourier transform is one-one on L (RY). 


Proor. If f is in L! and fis identically 0, then fis in L!, and the inversion 
formula (Theorem 8.4) applies. Hence f is 0 almost everywhere. 


3. Fourier Transform on L”, Plancherel Formula 


We mentioned in Section | that the Fourier transform is of great importance in 
analyzing operators that commute with translations. The initial analysis of such 
operators is done on LR” ), and this section describes some of how that analysis 
comes about. The first result is the theorem for R™ that is the analog of Parseval’s 
Theorem for the circle. 


Theorem 8.6 (Plancherel formula). If f isin L'(R”)M L?(RY), then || flr = 
If lla. 


REMARKS. There is a formal computation that is almost a proof, namely 
S\F@)P dx = f f*(—x) f @)dx = (f* * f)O) 
= [ Fx Fordy =f FOFOdy = [IFO ay, 
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the middle equality using the Fourier inversion formula (Theorem 8.4). What is 
needed in order to make this computation into a proof is a verification that the 
Fourier inversion formula actually applies. We know that f** f is in L! since f* 
and f are in L', and we know from Proposition 6.18 that f* * f is continuous, 
being in L? + L?. But it is not immediately obvious that the Fourier transform 
to which the inversion formula is to be applied, namely frxf = If l2, isin L!. 
We handle this question by proving a lemma that is a little more general than 
necessary. 


Lemma 8.7. Suppose f is in L'(R% ), is bounded on RY, and is continuous 
at 0. If f(y) > 0 for all y,then f isin L'(RY). 


PROOF. Put g(x) = ek!” and ge(x) = e Ng(e!x). Then the function 
¢- * f is continuous by Proposition 6.18 since ¢, is in L© and f is in L!, and 


lim (ge * f)(0) = f (0) 


by Theorem 6.20c. The function @ is in L', and fis bounded. Hence Oe*f = 
@ f isin L'. By the Fourier inversion formula (Theorem 8.4), 


(y. * f)O) = / Fete vl dy. 
RN 


Letting ¢ decrease to 0 and taking into account the monotone convergence, we 
obtain f (0) = San f(y) dy. Therefore f is integrable. 


PROOF OF THEOREM 8.6. The remarks after the statement of the theorem prove 
everything except that the Fourier transform f* * f = |f |° isin L!, and this step 
is carried out by Lemma 8.7. 


Abstract linear operators between normed linear spaces were introduced in 
Section V.9, and Proposition 5.57 showed that boundedness is equivalent to 
uniform continuity. Let us make use of such operators now. 

Theorem 8.6 allows us to extend the Fourier transform for R” from L!N L? to 
L?. In fact, Proposition 5.56 shows that L! N L? is dense in L?. The conclusion 
of Theorem 8.6 implies that the linear operator F is bounded relative to the 
L? norms on domain and range, and hence it is uniformly continuous. Since 
the range space L* is complete (Theorem 5.59), Proposition 2.47 shows that F 
extends to a continuous map F : L? > L?. This extended map, also called F, 
is readily checked to be linear and then is a bounded linear operator satisfying 
FF lly = Wf lly for all f in L?. 
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If f is in L?(R%), we can use any approximating sequence from L! N L? to 
obtain a formula for Ff. One such is f Jg:r.0), as R increases to infinity through 
some sequence. Thus 


Ff(y)= lim / F(aje 8? ax, 
(in L? sense) |x|<R 
R>ow 


Corollary 8.8. If f is in L'(R%) and g is in L?(R), then F(f * g) = 
F(f)F(g) and F(g*) = F(g). 


PROOF. Set gn = g/p(n-0), 80 that g, is in L'M L? for all n and g, > g in 
L?. Then f * gn > f *g in L’ since || f * gn — f * gil = If * (8n — Bly < 
I fl,llgn — glly- Therefore F(f)F(gn) = Ff * gn) > FCF * g) in L?. Since 
F(f) is abounded function and F(g,) — F(g) in L, we see that F( f)F(gn) > 
F(f)F(g) in L?. Hence F(f * g) = F(f)F(g). The identity F(g*) = Fig) is 
proved similarly. 


Theorem 8.9 (Riesz—Fischer Theorem). The Fourier transform operator F 
carries L7(IR%) onto L7(RY). 


PROOF. The operator F is built from the integral Spx f ye 77"? dx.. In 
a similar fashion, build an operator Z from foy f (x)e?"'*'Y dx, or equivalently 
define Zf(y) = Ff (—y). Then ||Zf ||, = || fll, for f in L’. It is sufficient to 
prove that FZ = 1 on L?, since for any f in L7, the equation FZ = 1 implies 
that Zf is a member of L? carried to L” by F. Moreover, FZ is continuous, 
being bounded. It is therefore enough to prove that FZf = f for f in a dense 
subspace of L?. We shall do so for the dense subspace L! NM L?. 

Fora function f in L'ML? with the additional property that fis in L! (and also 
L? by Theorem 8.6), Theorem 8 4 for Z (or Theorem 8.4 applied to the function 
f (—*x)) shows that FZf = f. 

__For a general f in L' NL’, form ¢, * f, where g(x) = el Then 
Qe * O*f = Gf i isin L'N L?; sin fact, it is in i? “DY Proposition 6.14 and Theorem 
8.6, and it is in L! because fi is bounded and @; is in L!. By the special case just 
proved, FZ(y. * f) = Ge * f. Since FT is continuous and g, « f > f in L? 
by Theorem 6.20a, FZ f = f. Thus FZf = f on the dense subspace L!Q L?, 
and the proof is complete. 


We shall be interested especially in bounded linear operators A on L?(R™) 
that commute with translations, i.e., that satisfy A(t, f) = t, (Af) for all x in RY 
and all f in L?. Recall that the operator norm || A|| of a bounded linear operator 
on L? is the least C such that IAF Il, < Cll fll, for all f in i, 
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EXAMPLES. 


(1) The translation t,, is an example of a bounded linear operator on L? 
that commutes with translations; the commutativity in question follows from the 
commutativity of R% as an additive group, and the equality ||tx, f le = Fs 
shows that t,, is bounded with ||t,,|| = 1. In terms of Fourier transforms, 
Proposition 8.1 shows that (tf) (vy) = eo Ry f(y). 


(2) Another example of a bounded linear operator on L? that commutes with 
translations is the operator Ag = f * g for fixed f in L'. This commutes 
with translations by Proposition 6.15, and it is bounded with ||A|| < || f|l, by 
Proposition 6.14. Proposition 8.1 shows that Ag = = fz. 


__ (3) Let M(y) be any L® function on R, and for f in L’, define Af by 
A Af = M f. The function fi is in L? by the Plancherel Theorem, M fi is in L? 
since M is essentially bounded, and M f is the Fourier transform of some L? 
function by the Riesz—Fischer Theorem. We take this L” function to be Af. The 
brief formulais Af = F—'(MFf). From the inequalities IAS Il, = IMF Fl, < 
IM oll FF lle = IM loo ll fla, we see that A is bounded with || Al] < || M||,,. The 
bounded linear operator A commutes with translations, since 


F(A(tx f))(y) = FF MF ty f(y) = MF ty f (vy) = Me" Ff (y) 
and F(tx(Af))(y) = e "FLAS )(y) = "My Ff (y). 


One speaks of the function M as a multiplier on L?. The previous two exam- 
ples are instances of this construction. Example | has M(y) = e777, and 
Example 2 has M(y) = f(y). We shall see in Theorem 8.14 in the next section 
that every bounded linear operator A on L? commuting with translations arises 
from some such essentially bounded M and that || A|| = ||/||,,; for this reason a 
bounded linear operator on L* that commutes with translations is often called a 
“multiplier operator” or a “bounded multiplier operator” on L. 


4. Schwartz Space 


This section introduces the space S(R”) of Schwartz functions on R%. This 
space is a vector subspace of L' (IR), so that the Fourier transform is given on it 
by the usual concrete formula; S(R” ) will turn out to be another space besides L? 
that is carried onto itself by the Fourier transform. Working with S(R" ) provides 
a convenient way for using the Fourier transform and derivatives together, as 
becomes clearer when one studies partial differential equations. 

If Q is a complex-valued polynomial on R%, define Q(D) to be the partial 
differential operator with constant coefficients obtained by substituting, for each 
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j with 1 < j < N,the operator Dj = - for x;. A Schwartz function on RY is 
a smooth function such that P(x) Q(D) f is bounded for each pair of polynomials 
P and Q. An example is the function el” since its iterated partial derivatives 
are all of the form R (xjew™ le? for some polynomial R. The Schwartz space 
S = S(R") is the set of all Schwartz functions. 

The Schwartz space S is evidently a vector space, and it is closed under 
partial differentiation and under multiplication by polynomials. Closure under 
partial differentiation is in effect built into the definition. To see closure under 
multiplication by polynomials, it is enough to check closure under multipli- 
cation by each monomial x;. This closure follows readily from the formula 
OD); f) = O*(D)f + xj;Q(D)f, where Q* is 0 or is a polynomial having 
degree strictly lower than Q has. 

If f is a Schwartz function, then P(x) Q(D) f is actually integrable, as well as 
bounded, for each pair of polynomials P and Q. In fact, (1+|x|?)" P(x) Q(D) f 
is bounded, and therefore P(x) Q(D)f is < a multiple of the integrable function 
(1 + |x|?)~%. In particular, S is contained in L', L*, and L®. 

Finally the Fourier transform F carries S into itself. In fact, parts (f) and (g) 
of Proposition 8.1 give 


P(x)Q(D) f = P(x) F(Q(—2nix) f) = F(P ((2mi)'D)Q(—2n ix) f), 


and the right side is the Fourier transform of an L! function and therefore is 
bounded. 


Proposition 8.10. The Fourier transform F is one-one from S(R”) onto 
S(R" ), and the Fourier inversion formula holds on S(R). 


Proor. Since S C L!, Fis one-one on Sasa consequence of Corollary 8.5. 
Since F(S) CSC L!, Theorem 8.4 shows that the Fourier inversion formula 
holds on S. Let (Zf)(x) = (Ff)(—x) for f in L'!. Then Z(S) C S. The 
Riesz—Fischer Theorem (Theorem 8.9) shows that FZ = 1 on L!/N L’, and 
hence FZ = 1 on Sas well. Therefore if f is in S, then g = Zf is a member of 
S such that Fg = f, and we conclude that F carries S onto S. 


To make effective use of Proposition 8.10, we need to know that S(R) is quite 
large, large enough so that we can shape functions suitably when we need them. 


For U open in R%, let C&.(U) denote the vector space of smooth complex- 


valued functions on U whose support is a compact subset of U. It is apparent 


that C&_,(U) is closed under pointwise multiplication and that every member of 


C&® (U) extends to a member of C°° (IR) when set equal to 0 off U. But it is 


com com 


not apparent that CS,,(U) contains nonzero functions. We shall construct some. 


386 VU. Fourier Transform in Euclidean Space 


Lemma 8.11. If 5; and 53 are given positive numbers with 6; < 52, then there 
exists y in C UR ) such that (x) depends only on |x|, y is nonincreasing in 


|x|, w takes values in [0, 1], W(x) = 1 for |x| < 5,, and w(x) = 0 for |x| > 59. 


PROOF. We begin from the statement in Section I.7 that the function f : R > R 
with f(t) equal to e~!/ * fort > Oand equal to 0 for t < 0 is smooth everywhere, 
including at t = 0. (The verification that f is smooth occurs in Problems 20-22 
at the end of Chapter I.) If 6 > 0, then it follows that the function gs(t) = 
f(6+1t)f (6 —t) is smooth. Consequently the function hs(t) = jos gs(s) ds is 
smooth, is nondecreasing, is 0 for t < —é, is some positive constant for t > 6, 
and takes only values between 0 and that positive constant. Forming the function 
hs, (t) =As(v +t)hs(r —t) with at least 35 and dilating it suitably, we obtain a 
smooth even function Wo(t) with values in [0, c], the function being identically 0 
for |t| > 62 and being identically c for |t| < 5,. Putting (x) = c7!Wo(|x|), we 
obtain the desired function. 


Proposition 8.12. If K and U are subsets of R% with K compact, U open, 
and K C U, then there exists g € C&.,(U) with values in [0, 1] such that ¢ is 
identically 1 on K. 


PROOF. There is no loss of generality in assuming that K is nonempty and U 
is bounded. The continuous distance function D(x, U‘) is everywhere positive 
on the compact set K and hence assumes a positive minimum c. Define K’ to 
be the set {x € R™ | D(x, K) < ic}. This set is compact, contains K, and 
has nonempty interior. Since the interior is nonempty, K’ has positive Lebesgue 
measure |K’|. Applying Lemma 8.11, let h be a nonnegative smooth function 
that vanishes identically for |x| > fe and has total integral 1. 

Define g = h x Ix, where Ix: is the indicator function of K’. Corollary 6.19 
shows that gy is smooth. The function g is > 0 and has sup |¢| < ||/ |] ||Zx-lloo = 1. 

We have g(x) = Jpn h(x — y)Ix:(y) dy. Ifx isin K and h(x — y) is nonzero, 
then |x — y| < ic. Then D(y, K) < |x — y| < ic, and y is in K’. Hence 
Ik(y) = 1, and g(x) = fon h(x — y)dy =1. 

Next, suppose D(x, U°) < ic and h(x — y) is nonzero, so that again |x — y| < 

1 


ic. The claim is that y is not in K’, ie., that D(y, K) > zc. Assuming the 


contrary, we can find, because of the compactness of K, some k € K with 
ly—k| < ic. Then every uS € U* satisfies c < |uS —k| < |uS — x|+ 
Ix—y|+|y —k| < Jus —x|+ ae+ ic, and we obtain |u° — x| > 5c. Taking the 
infimum over u°, we obtain D(x, U°) > 5c, and this is a contradiction. Thus y 
is not in K’, and the integrand is identically 0 whenever D(x, U%) < ic. Hence 
g(x) =O0if D@,U*) < iC, and the support of g is a compact subset of U. This 
completes the proof. 
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Every function in CS, (R) is the Fourier transform of some Schwartz function 


by Proposition 8.10, and there are many such functions by Proposition 8.12. With 
this fact in hand, we can prove the theorem about operators commuting with 
translations that was promised in the previous section. We begin with a lemma. 


Lemma 8.13. If A is a bounded linear operator on L?(R%) commuting with 
translations, then A commutes with convolution by any L! function. 


PROOF. Weare to show that A(f*g) = f (Ag) if f isin L! and gisin L?. Let 
€ > 0 be given. Corollary 6.17, with g; = g and gy = Ag, shows that there exist 
yt, +++, Yn in RN and constants c),..., C, Such that \f *g- via cjTy,8|5 <e€ 
and f * Ag — ee) cjTy, Ag||, < €. Then we have 


IAC #8) — f * Agi < |A(f *¥8 — Diets), 


cis || A( ee CjTy,8) - ee CjTy; Ag||, 
+ | wijn1 City Ag — f * Ag ls 


The first term on the right side is < ||A|| \|f *g— i= cjTy,8 | < €||All, the 
second term is 0 since A commutes with translations, and the third term is < € 
by construction. 


Theorem 8.14. If A is a bounded linear operator on L*(R” ) commuting with 
translations, then there exists an L® function M such that Af = F~'(MFf) for 
all f in L?. Asa member of L®, M is unique and satisfies |M ||. = |All. 


REMARKS. The idea of the proof comes from the corresponding result for L* of 
the circle, where it is easy to define M. Call the operator T in the case of the circle. 
Each function e’** is in L*, and the given operator T satisfies Th (el)) = 
T (T(e'**)) = T (e!kO—*0)) = eT (e'*). If we write f for the L* function 
T (e’**) and form the Fourier series expansion f(x) ~ )° cnei"*, then Tx) f has 
Fourier series t,, f(x) ~ So cne7'"*e'"* by linearity and boundedness of 7,,. 
Since we have just seen that t,, f = e~'** f, we conclude that * cne7i"e"* = 
diene e!™. If c, # 0 for some n unequal to k, then we do not have the 
term-by-term match required by the uniqueness theorem. Hence only c, can 
be nonzero, and we have T (e'*) = c,e'**. The number c, is the value of the 
multiplier M at the integer k. In the actual setting of the theorem, the circle is 
replaced by R, and individual exponential functions are not in L*. Thus this 
easy process for obtaining M is not available, and we are led to construct M by 
successive approximations. 


PROOF. Choose, by Proposition 8.12, functions ®; € ce (R* ) with 
(i) O < &(y) < 1 forall y, 
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(ii) Oy(y) = 0 for |y| > k +1, 

(iii) @,(y) = 1 for |y| < k. 
Then ©,O; = ®npin(jn if j A k, and ®, is in cx (R*) and hence in the 
Schwartz space S(R”). Put @& = F-'(®,). Proposition 8.10 shows that gj, is in 
S, and therefore @, is in L'!ML?. Since the Fourier transform carries convolution 


into pointwise product, we have @; * x = Yminc j,k) if j Ak. Define 
My = F(AGx) 


as an L? function. Lemma 8.13 shows that A commutes with convolution by an 
L! function, and thus gy * Agys1 = A(x * Ge41) = AG = AGE * Pey2) = 
x * APx42. Consequently 


Dy Misi = Px Mi+2 
and My41(y) = Mi42(y) for |y| <k. 


This equation shows that if we put 
M(y) = Meyi(y) forlyl sk, 


then M is consistently defined and is locally in L?. = 

Let So = F-'(C&_(R%)) C S(RY). If a member f of So has f(y) = 0 
for |y| => k, then FOesy = f and hence f * G41 = f. Application of A gives 
Af = ACf * O41) = f * Aes. If we take the L? Fourier transform of both 
sides and use Corollary 8.8, we obtain F(Af) = Mx41 f. The right side equals 
Mf since f(y) = 0 for |y| > k, and thus 


F(Af) =Mf 


whenever f is in So and fy) = Ofor |y| > k. 

The subspace C&° (R™) of L? is dense by Corollary 6.19 and Theorem 6.20a. 
Since the L? Fourier transform carries L? onto L? and preserves norms (Theorems 
8.6 and 8.9), Sp is dense in L?. Leta general f in L? be given, and choose a 
sequence { f;} in So with f; > f in L?. Then F(Af;) > F(Af) in L?. By 
Theorem 5.59 we can pass to a subsequence, still written as { fj}, so that fj > f 
and F(Af;) — F(Af) almost everywhere. Consequently 


F(Af)(y) = lim F(Af;)(y) = lim M(y) Ff) ) 
= M(y)lim F( f(y) = MO) F(A) 


almost everywhere. 
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To see that M is in L™, suppose that |M(y)| => C occurs at least on a set E of 
positive finite measure. Then /,¢ is in L?. If we put f=F —!(7), then we have 
JANI fll = WA p=IFAPI=IMF p= IM Telly = Cells = Clif llr 
and hence ||A|| > C. Therefore ||A|| > ||/||,,. In particular, M is in L®. 

In the reverse direction we have ||Af||, = ||IF(A/ ||, = IMFACAI, < 
IMIolF Aly = Moll fll, for all f in L?, and thus ||Al| < ||M||,. We 
conclude that || M||,, = || All. This completes the proof of existence. 

If we have two candidates for the multiplier, say M and M,, then subtraction 
of the equations F(Af) = MF(f) and F(Af) = M,F(f) shows that 0 = 
(M — M,)F(f) for all f in L*. Therefore M = M, almost everywhere. This 
proves uniqueness. 


5. Poisson Summation Formula 


The Poisson Summation Formula is a result combining Fourier series and the 
Fourier transform in a way that has remarkable applications, both pure and applied. 
Nowadays the formula is expressed as a result about Schwartz functions and 
therefore fits at this particular spot in the discussion of the Fourier transform. 

Part of the power of the formula comes about because it applies to more settings 
than originally envisioned. The Euclidean version applies to the additive group 
IR, the discrete subgroup of points with integer coordinates, and the quotient 
group equal to the product of circle groups. In this section we shall take N = 1 
simply because a theory of Fourier series has been developed in this book only 
in one variable. 

We begin by stating and proving the 1-dimensional version of the theorem. 


Theorem 8.15 (Poisson Summation Formula). If f is in the Schwartz space 
S(R!), then 


 fetm= YO fae. 


n=—OCO n=—oo 


PROOF. Define F(x) = Sale oo J (x +n). From the definition of S, it is easy 
to check that this series is uniformly convergent on any bounded interval and 
also the series of k derivatives is uniformly convergent on any bounded interval 
for each k. Consequently the function F' is well defined and smooth, and it is 
periodic of period one. We form its Fourier series, taking into consideration that 
the period is 1 rather than 277; the relevant formulas for Fourier series when the 


period is L rather than 27 are 


= 1 rt 
f(x) ~ S Caer with c, = | fHecl dt. 
5 


n=—OO 2 
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A smooth periodic function is the sum of its Fourier series, and thus 


F(x) = Pe ee Gh EQee™ dt)e 2minx | (x) 


The Fourier coefficient in parentheses in () is 


fo F@e 2 dt = fy Ve f(t + be" dt 
=e oh ft +e dt 
=e i foe mr at 
= tweed 
= fin), 


and the theorem follows by substituting this equality into (+). 


2 2 


Corollary 8.16. °° eo" =r NO eo” for any r > 0. 


n=—oo © 


PROOF. The romans before Theorem 8.3 show that it we define g(x) = ee 
and Ge) = = 'g(e'x), then %-(y) = Gey). If we put f(x) = rg(a) = 
e-7" °*” then it follows that fi (y)=re-7" ~ . Applying Theorem 8.15 to this f 
and setting x = 0 gives the asserted equality. 


In one especially significant application of the 1-dimensional Euclidean version 
of the Poisson Summation Formula to pure mathematics, the remarkable identity 
in Corollary 8.16 can be combined with some complex-variable theory to obtain 
a functional equation for the Riemann zeta function, which is initially defined 
for complex s with Res > 1 by 


f(s) = 2 ~ = au0 (1 = =) 


The functional equation relates ¢(s) to ¢(1 —s). More precisely the function ¢(s) 
extends to be defined in a natural way” for all s in C — {1}, and the functional 
equation is 


ACL —s) = A(s), where A(s) = c(s\P(4s)07 2. 


This implication is just the beginning of a deep theory in which Fourier analysis, 
complex-variable theory, algebraic number theory, and algebraic geometry come 


>The natural way is as an analytic function in C — {1}. The function has a simple pole at s = 1. 
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together to yield a vast array of surprising results about prime numbers. The 
derivation of the functional equation of the Riemann zeta function uses some 
complex-variable theory, and we shall not give it. 

In real-world applications the 1-dimensional Fourier transform is of great 
significance because of its interpretation in signal processing. A given function 
f(t) on R! is interpreted as the voltage of some signal, written as a function 
of time, and the Fourier transform f(q@) gives the components of the signal at 
each frequency w. The Plancherel formula states the comforting fact that energy 
can be computed either by summing the contributions over time or by summing 
the contributions over frequency, and the result is the same. Convolutions are 
of special significance in the theory because they represent the effects of time- 
independent operations on the signal—such as the passage of the signal through 
a filter. 

To make numerical computations, one takes some discretized version of the 
signal, obtained for example by rapid sampling over a long interval of time. The 
discrete signal, which may well be obtained at 2” points for some n, is then 
regarded as periodic. In other words, the signal is really a function on a cyclic 
group of order 2”. Computing a convolution involves multiplying each translate 
of one function by the other function at 2” points, adding, and assembling the 
results. The number of steps is on the order of 27”. Alternatively, a convolution 
can be computed using Fourier transforms: One computes the Fourier transform 
of each function, does a pointwise multiplication of the new functions, and then 
computes an inverse Fourier transform. The pointwise multiplication involves 
only 2” steps, which is relatively trivial compared with 27” steps. How many 
steps are involved in the computation of a Fourier transform? Naively it would 
seem that an exponential depending on y has to be multiplied by the value of 
the function at each point x and the results added, hence 27” steps. However, 
the mechanism of the Poisson Summation Formula contains a better way of 
carrying out the computation of the Fourier transform that involves only about 
n2” steps. The algorithm in question is known as the fast Fourier transform 
and is discussed in more detail in Problems 13-18 at the end of the chapter. The 
upshot is that the Poisson Summation Formula leads to a practical device that 
cuts down enormously on the cost of analyzing signals mathematically. 

Although the Poisson Summation Formula as stated in this section relates the 
real line, the subgroup of integers, and the quotient circle group, the fast Fourier 
transform iterates versions of the formula for settings that are different from this. 
The groups in question are cyclic of order 2* with k < n. We can take the 
subgroup to have order 2, and the quotient group then has order 2—!. A still 
more general version of the Poisson Summation Formula applies to any “locally 
compact” abelian group with a discrete subgroup having compact quotient. This 
more general version of the formula is used in the full-fledged application to pure 
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mathematics that combines Fourier analysis, complex-variable theory, algebraic 
number theory, and algebraic geometry. 


6. Poisson Integral Formula 


Let RY! be the open half space {(, t) | x € R’ andt > O}. We view the 
boundary {(x,0)| x ¢ R%} as R%. Fora function f in L?(R) for p equal to 1, 
2, or co, we consider the problem of finding u(x, t) that is defined on RY! , has 
f as boundary value in a suitable sense, and is harmonic in the sense of being a 
C? function satisfying the Laplace equation Av = 0, where A is the Laplacian 


a? a 


— | | 
= aD, 
Ox; 


resend : 
2 
Oxy 


We studied the corresponding problem for the unit disk in a sequence of 
problems at the ends of Chapters I, III, IV, and VI. In that situation the open 
disk played the role of the open half space, and the circle played the role of the 
Euclidean-space boundary. We were able to see that the unique possible answer, 
at least if f is of class C*, is given by the Poisson integral formula for the unit 
disk: 


1 aE: 
u(r, 0) = az fO@— P(g) dg, 


1—r? oo inO 
where P,(6) = 1—2r cosO-+r2 ane : 


N+1 


The situation with R is different. One complication is that the boundary 
is not compact, and a discrete sum can no longer be expected. Another is that the 
harmonic function with given boundary values need not be unique; in fact, the 
function u(x,t) = ¢t is a nonzero harmonic function with boundary values given 
by f = 0, and thus we cannot expect to get a unique solution to a boundary-value 
problem unless we impose some further condition on u. In effect, the condition 
we impose will amount to a growth condition on the behavior of u at infinity. A 
partial compensation for these two complications is that the boundary is now the 
Euclidean space RN’, and dilations are available as a tool. 

Let us make a heuristic calculation to look for a harmonic function with given 
boundary values. Suppose u(x, ft) is the solution we seek that corresponds to 
f. Then we expect that the translate t,,u(x, t) is the solution corresponding to 
Tx, f (x). We might further expect that the mapping f +> u(-,t) is bounded on 
L?(R%). By Theorem 8.14 we would therefore have 


f(y, t) =mily) f(y), 
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for some multiplier m,;(y); the Fourier transform is to be understood as occurring 
inthe x variable only. Ift; > Ois fixed, then u(x, t+t,) is harmonic with boundary 
value u(x,t), and sou(y, t+) = m(y)u(y, 1) = mi(y)m, (y) f (Y). The left 
side equals m;++,(y) f(y), and therefore 


M41 (y) = m,(y)my (y). 


Since this is only a heuristic calculation anyway, we might as well assume that m 
is jointly measurable. Then we deduce that 


m,(y) = 80) 


for some L° function g. To compute g, we use the condition Au = 0 more 
explicitly. Formally, as a result of the Fourier inversion formula, u(x, t) is given 
as 


/ f(y, tem? ay= [ mi (y) f (yer? ay= f e!8) Flyer") dy, 
RY RYN RN 


Without regard to the validity of the interchange of limits, we differentiate under 
the integral sign to obtain 


a? an 

Dx u(x,t) = —4n? [ yee OF Qe te? dy 
j R 

J 

2 


a Bee 
and a. u(x,t) = i g(y)-el8 Ferry dy. 
at ie 


Summing the derivatives and taking into account that fy) is rather arbitrary, we 
conclude that g(y)? = 427|y|?. Since m;(y) is an L© function, we expect that 
the negative square root is to be used for all y. Thus g(y) = —2z|y|. Therefore 
our guess for the multiplier is 


m(y) = emt, 


This is an L! function, and we begin our investigation of the validity of this 
answer by computing its “inverse Fourier transform,’ to see what to expect for 
the form of the bounded linear operator f > u(-, 1). 


Lemma 8.17. For t¢ > 0, 


/ e 2Ttlyl g2tix-y dy = CN z 
RY Cane 


where cy = me 2(N+ DP (Nt), 
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REMARK. The idea is to handle t = 1 first and then to derive the formula for 
other t’s by taking dilations into account. For tf = 1, we express e~77!,! as an 
integral of dilates of e~7! I’ and then the integral in question will be computable 
in terms of the known inverse Fourier transforms of dilates of e~7!"”. 


PROOF. In one dimension, direct calculation using calculus methods on the 
intervals [0, --oo) and (—o«, 0] separately gives 


Ee —2n |u| ,—27iuv 1 1 
e e du = — : 
ine xz 1+v2 


Since (1+ v7)~! is integrable, the Fourier inversion formula in IR! (Theorem 8.4) 


then shows that 
1 oe 1 2miuv —27 |u| 
— e dv=e : 
T Jo 1+v? 


Putting uv = |y| with y in RY yields 


i Fay coca 

erly = ~ | ell (1 + y?)“! dv. (*) 
WT J—oo 

Any B > Ohas B! = [5° e~ ds, and hence (1+ v*)~! =x f° e Ate As dy, 

Substitution for (1 + v7)~! in (x), interchange of integrals by Fubini’s Theorem, 

and use of the formula in R! for the inverse Fourier transform of a dilate of e~7”” 

gives 


oe) lee) ; . foe) ‘ 
e 27 = i mall e2tiviyl g—mv Ss dv| ds = e526 -aIyI /s ds, 
0 —oo 0 


and this is our formula for e~?7!! as an integral of dilates of ety 

We multiply both sides by e?”"*"’, integrate, interchange the order of integra- 
tion, use the formula in R” for the inverse Fourier transform of a dilate of e~7! I? 5 
and make a change of variables zs(1 + |x|”) + s. The result is 


foe) 
/ e279 e2mix-y dy = eral) e TYP /s g2mix-y dy| ds 
RYN 0 RY 


CO 


a Lien 2 
e 1S o3(N No s|x| ds 


= 2) lene 
e-TS(IHIXP) 3 (N=D ay 


i 
lee) 
I 
—1(N-1 2,-4(N-1)_-1 2\-1 oa 1 (N—1) 
ag 2A Ae Py 28 wz (1+ |x|‘) / e *s2 ds 
0 


= NDP (MEL) 4 [xy 24D, 
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The proof is completed by making use of the effect of the Fourier transform 
on dilations. We have just seen that the function g(x) = (1 + |x|2)7 2+) 
is integrable with Fourier transform cy OY) = Cy er || Then g,(x) = 
t-No(t-!x) = t(t2+|x|2)- 247 has Fourier transform cy' @(ty) = eye 27)". 


We define 
cnt 


Cerrone 


PQ,H=Pa)= 


for tf > 0, with cy as in Lemma 8.17, to be the Poisson kernel for RE, The 
Poisson integral formula for Rv is u(x,t) = (P; x f)(x), and the function u 
is called the Poisson integral of /. 


Proposition 8.18. The Poisson kernel for RY +! has the following properties: 
(a) Pa) =r" P(x), 

(b) P, is integrable with P,(y) = e727", 

(c) P, => Oand fay P(x) dx =1, 

(d) P,* Py = Pry, 

(e) P(x, t) is harmonic in N + 1 variables. 


PROOF. Conclusion (a) is by inspection. For (b), the formula for P; shows 
that P,, for fixed t, is continuous and is of order yl +) as y tends to infinity. 
Therefore P, is integrable. The formula for P, then follows from Lemma 8.17 
and the Fourier inversion formula (Theorem 8.4). In (c), the first conclusion is by 
inspection of the formula, and the second conclusion follows from (b) by setting 
y = 0. Conclusion (d) follows from the corresponding formula on the Fourier 
transform side, namely PP, = Pay, and conclusion (e) may be verified by a 
routine computation. 


Theorem 8.19. Let p be 1, 2, or 00, let f be in L?(RY), and let u(x,t) = 
(P, * f)(x) be the Poisson integral of f. Then 


(a) u is harmonic in N + 1 variables, 

(b) llu(-. Ol, < WF il, 

(c) u(-,t) converges to f in L? as t decreases to 0 provided p < 00, 

(d) u(x, t) converges to f (x) uniformly forx in E as t decreases to 0 provided 
f isin L® and f is uniformly continuous at the points of E, 

(e) the maximal function f**(x) = sup,.9 |(P:* f)(x)| satisfies an inequality 
m({x | f(x) > é}) < CIP ll, /é with C independent of f and é, 

(f) (Fatou’s Theorem) u(x, t) converges to f (x) for almost every x in R¥. 


REMARKS. The theorem says that u is harmonic and has boundary value f in 
various senses. The hypothesis for (f) is really that f is the sum of an L! function 
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and an L® function, and every L? function has this property, as will be observed 
in the proof below. 


PROOF. Let us leave aside (a) for the moment. Conclusion (b) is immediate 
from Proposition 6.14 and parts (a) and (c) of Proposition 8.18. Conclusions (c) 
and (d) follow from parts (a) and (c) of Theorem 6.20. Conclusion (e) follows from 
Corollary 6.42 and the Hardy—Littlewood Maximal Theorem (Theorem 6.38), and 
conclusion (f) for L! functions f is part of Corollary 6.42. Now suppose that f 
is an L° function. Fix a bounded interval [a, b] and write f = fi + fo with fi 
equal to 0 off [a, b] and f equal to 0 on [a, b]. Then P, * f; converges almost 
everywhere to f; since f; is integrable, and P, « f. converges to 0 everywhere on 
(a, b) by (d). Hence P, « f converges almost everywhere on (a, b); since (a, b) 
is arbitrary, P, « f converges almost everywhere. This proves (f). 

Now we return to (a). Since P(x, t) is harmonic, conclusion (a) represents 
an interchange of differentiation and convolution. The prototype of the tool we 
need is Corollary 6.19, but that result does not apply here because P; does not 
have compact support. If we break a function f into two pieces, one where | f'| 
is > 1 and one where | | is < 1, we see that any L” function is the sum of an L! 
function and an L© function. Thus it is enough to prove (a) when f is in L! or 
| ae 

Let g be P or one of P’s iterated partial derivatives of some order, let 1 < j < 
N +1, and define Dj; to be 0/dx; if j < N or d/dt if j = N +1. It is sufficient 
to check that 


h-'((g * f(x, t) + hej) — (g * f)(x,t)) — (Dye) * f$)@, 1) 


tends to 0 pointwise as h tends to 0. Taking Proposition 6.15 into account, we 
see that we are to check that 


(Ao, 1) + hej) = C1) = (Djg-.1)) # F@) 
tends to 0 as h tends to 0. Proposition 6.18 shows that it is enough to have 
h~' (p(x, t) + hej) — g(a, t)) — (Djp)@, 1) 


tend to 0 in L® of the x variable for each fixed ¢ if f is in L', or in L! of the 
x variable for fixed t if f is in L°°. The Mean Value Theorem shows that this 
expression is equal to 


(Djy)((x, t) + h’e;) — (Djg)(x, t) 


for some h’ between 0 and h, with h’ depending on x and ft, and a second 
application of the Mean Value Theorem shows that the expression is equal to 


h' (D7 p)((x, t) + h"e;). 
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We are to show that this tends to 0, for fixed ¢, uniformly in x and in L! of the x 
variable as h tends to 0. It is enough to show for each fixed ¢ that 


(D79)((x, t) + hej) (x) 


is dominated in absolute value by a fixed bounded function of x and a fixed L! 
function of x when h satisfies |h| < 5 min{1, t}. 

An easy induction on the degree shows that any d'-order partial derivative of 
P(x, t) is of the form Q(x, t)(t? + |x[2)-2N+D—d | where Q(x, ft) is a homoge- 
neous polynomial in (x, t) of degree d + 1. Since any monomial of degree 1 is 
bounded by a multiple of (¢7 + |x|?)!/7, the d*-order partial derivative is bounded 
by a multiple of 

(12 + [x|2)-2 04D -2@-D (46) 


Thus the desired properties of the expression (+) will follow if it is shown that 
(*«) has these properties. This is a routine matter for d > 1, and the proof of (a) 
is complete. 


7. Hilbert Transform 


This section concerns the Hilbert transform, the bounded linear operator H on 
L? (RY) given by 

F(Hf)(y) = —i(sgny)(Ff)(y). 
Formally this operator has the effect, for y > 0, of mapping exponentials by 


2mix-y —2mix-y 
’ 


gr) Ss ie and) e777 *O ps ge 


and hence of mapping cosines and sines by 
cos(2mx-y)h sinQax-y) and sinQzrx-y)h —cos(2Qrx- y). 


For this reason, engineers sometimes call the Hilbert transform a “90° phase 
shift’ The notion is of such importance that there is even a piece of hardware 
called a “Hilbert transformer’ that takes an input signal and produces some kind 
of approximation to the Hilbert transform of the signal.* 

We shall do some Fourier analysis in order to identify H more directly. To 
get an idea what H is, we begin by computing the effect on L? of composing 
the Hilbert transform and convolution with the Poisson kernel P,(x). Then we 
examine what happens when ¢ decreases to 0. 


3The delay in time that a Hilbert transformer requires in producing its output imposes a built-in 
theoretical limit for how good the approximation to the Hilbert transform can be. An exact result 
would require an infinite time delay. 
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. 1 
Lemma 8.20. For ¢ > 0. | (—i sgnt)e 27M le2™iet gy — — esi 
R! m 6% + x? 


PROOF. This result follows by direct calculation, using calculus methods on 
the intervals [0, +00) and (—o0o, 0] separately. 
i ae 


If we define Q(x) = — 


ia for x in R!, then O,(x) = e~!Q(e7!x) = 
a x 


1 
— is the function in the statement of Lemma 8.20. We define 
mw er + x2 
(dS 0s = 
x,ep= x)= — ———.,, 
‘ mw ez +x? 


for > 0, to be the conjugate Poisson kernel on Ri. The function Q, is not in 
L'(R!). However, it is in L?(R'), and therefore the convolution of QO, and any 
L? function is a well-defined bounded uniformly continuous function. For f in 
L, the function v(x, ¢) = (QO, * f)(x) is called the conjugate Poisson integral 


of f. 


Proposition 8.21. The conjugate Poisson kernel for R?. has the following 

properties: 
(a) the function v(x, y) = Qy(x) is harmonic in IR2_, and the pair of functions 
uand v with u(x, y) = Py(x) satisfies the Cauchy—Riemann equations 


ou dv Ou dv 
SS a —. = 
ox dy dy ox 


(b) the L Fourier transform F(Q,)(y) equals —i(sgn y)e7?7*!|, 
(c) 0. * Pay == Ovre' 


REMARKS. A fundamental result of complex-variable theory is that if u and 
v are C! functions on an open subset of C satisfying the Cauchy-Riemann 
equations, then f(z) = u(x, y) +iv(x, y) is an “analytic” function in the sense 
that in any open disk about any point in the open set, f(z) equals the sum of a 
power series convergent in that disk. We shall not make use of this fact, but the 
analyticity of u + iv is the starting point for a great deal of analysis that will not 
be treated in this book. In the special case of the Poisson kernel and the conjugate 
Poisson kernel, the function f is f(z) =i/(z). 


PROOF. Part (a) is a routine calculation. 
For (b), we know that Q, is in L? and has an L? Fourier transform g = F(Q,). 
The inverse Fourier transform F~! on L? satisfies F~'(g) = Q,, and (b) will 
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follow if we show that F—'(f) = Q,, where f(t) = —i (sgn t)e~27l'l, For each 
integer n > 0, let f,(t) be f (¢) for |t| < n andO for |t| > n. Then f, > f in L? 
by dominated convergence, and hence F | In) 2 F ake f)in i. By Theorem 
5.59 a subsequence of F —lCf) converges almost everywhere to F —l(f). Since 
f isin L', Lemma 8.20 shows that F~!(f,)(t) = i f (te?"'*" dt converges 
pointwise to Q,(x), and therefore F “l(f) = Q,. 

For (c), Corollary 8.8 shows that F(Q, * Ps’) = F(Qz:)F(Pe’). In combi- 
nation with Proposition 8.18b, conclusion (b) of the present proposition gives 
FO) (y)F(Pe)(y) = —i(sgn y)e2" E+)! ae., and this is F(Oc4e)(y) ae. 
by a second application of (b). 


Theorem 8.22. Let f be in L7(R!), and let u(x, y) = (Py * f)(x) and 
v(x, y) = (Qy * f)(x) be the Poisson integral and conjugate Poisson integral of 
jf. Then 


(a) the function v is harmonic in R2_, and the pair of functions uv and v satisfies 
the Cauchy—Riemann equations, 

(b) the function Q, * f is in L*(R!) for every ¢ > O, and its L* Fourier 
transform is F(O, * f)(y) = —i(sen ye 2" FUP), 

(c) lOc * fll = Pe * f lly < II flly for every € > 0, 

(d) O. * f > H(f) in L’ as ¢ decreases to 0. 


PROOF. Conclusion (a) is handled just like Theorem 8.19a. In the proof of 
Theorem 8.19a, the integrability of P, did not play a role; it was the integrability 
of the iterated partial derivatives of P, (i.e., the case d > 0) that was important. 
The estimates involving such derivatives here are the same as in that case. 

For (b), put g = F(Q,.)F(f). This is an L? function since F(Q-) is in 
L© by inspection and since F(f) is in L? by the Plancherel formula. Define 
Sn = IB(n:0) f 80 that each f, is in L! NL? and also fn a> fin L?. Since F(Q¢) 
is in L©, the Plancherel formula shows that g, = F(Q,)F( fn) is in L? for each 
n and converges to g in L?. Since f, isin L! and Q, is in L*, Corollary 8.8 gives 
F(O; * fn) = F(Oc)F( fn) = 8n for alln. Thus O, * fy = F~'(gn). We now 
let n tend to infinity. We know that ||Q, * f;, — Q- * alle <WQelloll fa — fllo- 
Since Q, is in L? and tn > fin L?, O.* Jn converges uniformly to Q, * f. On 
the other hand, F~!(g,) converges to F~!(g) in L*, and Theorem 5.59 shows 
that a subsequence converges almost everywhere. Therefore F~'(g) = O, « f. 
Consequently F(Q, * f) = g = F(Q,)F(f), and we obtain F(Q, * f)(y) = 
—i(sgn ye" FCF) (y). 

Conclusions (c) and (d) follow by taking L? Fourier transforms and using (b), 
Proposition 8.18b, and the Plancherel Theorem. This completes the proof. 
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To get a more direct formula for the Hilbert transform, we introduce the 
functions 


are for |x| > 1, 
h(x) = 


0 for |x| < 1, 
1 
= for |x| > «, 
and h(x) =e 'h(e'x) = 4 ™ 
0 for |x| < . 


Let w(x) = O(x) — h(x), so that We(x) = e7!W(e7!x) = Oe(x) — he(x). 
Lemma 8.23. The function w on R! is integrable, and A es w(x) dx =0. 


PROOF. For |x| < 1, we have W(x) = Q(x) = a7'x/(1 +x’). This is a 
continuous odd function, and therefore it is integrable on [—1, 1] with integral 0. 
For |x| > 1, we have w(x) = n"(745 a) = ill Crees 2 This is an 
integrable function for |x| > 1; since it is an odd function, its integral is 0. 


Theorem 8.24. Let h, be defined as above. If f is in L*(R'), then h, * f is 
in L?(R!) for every ¢ > O,andh, * f > H(f) in L’ as € decreases to 0. 


REMARKS. More concretely the limit relation in the theorem is that 
=f 
Hf(x) = lim LOE. 


(in L? sense) JU It|>e t 
e10 


The integrand on the right side is the product of two L? functions on the set where 
|t| > e, and it is integrable by the Schwarz inequality. 


PROOF. We have h, * f = QO. * f — We * f. The term Q, * f is in L? by 
Theorem 8.22b, and the term y, *« f is in L? by Lemma 8.23 and Proposition 
6.14. As ¢ decreases to 0, O, * f tends to Hf in L? by Theorem 8.22d, and 
We * f tends to 0 in L? by Theorem 6.20a. This completes the proof. 


Now that we have the concrete formula of Theorem 8.24 for the Hilbert trans- 
form on L? functions, we can ask whether the Hilbert transform is meaningful on 
other kinds of functions. For example, we could ask, If we have some other vector 
space V of functions and V MN L?(R') is dense in V, can we extend H to V? The 
answer for V = L! (R!) is unfortunately negative. In fact, if f isin L' NL”, then 
the Fourier transform f will be continuous and the Fourier transform of Hf will 
have to be —i(sgn y) f. If f() is nonzero, then —i(sgn y) f is not continuous 
and cannot be the Fourier transform of an L! function. 
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However, in Chapter IX we shall introduce L? spaces for 1 < p < ow, thereby 
extending the definitions we have already made for p equal to 1,2, and oo. Toward 
the end of the chapter, we shall see that the Hilbert transform makes sense as a 
bounded linear operator on L?(R!) for 1 < p < oo. This boundedness is an 
indication that the Hilbert transform is not a completely wild transformation, and 
the result in question will be used in the problems at the end of Chapter IX to 
prove that the partial sums of the Fourier series of an L” function on the circle 
converge to the function in L? as long as 1 < p < ov. 

Actually, this boundedness on L? will be a consequence of a substitute result 
about L! that we shall prove now. Although the Hilbert transform is not a bounded 
linear operator on L!, its approximations in the statement of Theorem 8.24 are 
of weak type (1, 1), in the same sense that the passage from a function to its 
Hardy—Littlewood maximal function in Chapter VI was of weak type (1, 1). 


Theorem 8.25. Let /, be the function on R! equal to 1/( x) for |x| > 1 and 
equal to 0 for |x| < 1. For f in L'(R') + L?(R!), define 


a a kta 


TS \t|>1 t 


Ay f(x) =hi* f@)= 


as the convolution of the fixed function 1, in L? with a function f that is the sum 
of an L! function and an L? function. Then 


Fi fille < Allfllo- 


with the constant A independent of f, and 


m{x eR! | |Hi f(x >} < i! 


for every € > 0, with the constant C independent of € and f/f. 


REMARK. This result about the approximation H, to H on L! and L? will be 
enough for now. The result for L! is much more difficult than the result for L*. 
In the next chapter we shall derive from Theorem 8.25 a boundedness theorem 
for all the other approximations H, = h,*(-) on L? (R!) for 1 < p < oo, witha 
bound independent of ¢, and then it will be an easy matter to get the boundedness 
of the Hilbert transform H itself on L? for these values of p. 


PRoor. A preliminary fact is needed that involves a computation with the 
function h;. We need to know that 


Jixjsar Ue +r) — ia) dx <6 (x) 
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whenever 0 < |r’| <r. To see this, we break the region of integration into four 
sets—one where |x| > 2r, |x| => 1, and |x +r’| => 1; a second where |x| > 2r, 
|x| < 1, and |x +r’| > 1; a third where |x| > 2r,|x| => 1,and|x+r’| < l; anda 
fourth where |x| > 2r,|x| < 1,and|x+r’| < 1. Forthe fourth piece the integrand 
is 0. For the second and third pieces, the integrand is < 1 in absolute value, and 
the set has measure < 2; hence each of these pieces contributes at most 2. For the 
first piece the absolute value of the integrand is |r’| if |Ix(x +r’)| < 2r/x?; thus 
the absolute value of the integral is < Siimor 2r/x? dx = 2. This proves (x). 

It is an easy matter to prove that Hj is a bounded linear operator on L*. In 
fact,h, = Q, — w, and yp is in L' by Lemma 8.23. Thus Theorem 8.22c gives 
Hi fly < 101 * fly + IW * filly < Illy + WII fil. In other words, A is 
bounded on L? with || Ay || < 1+ lw|],. Put A = 1+ |ly]l,. 

The heart of the proof is the observation that if F is in L', vanishes off a bounded 
interval J with center yo and double* J*, and has total integral Sei F (y) dy equal 
to 0, then 

Ai F'ilnia@-ry S SIF ly. (7) 


To see this, we use the fact that the total integral of F is 0 to write 


AF (x) = f, hi — y) Fy) dy = f, [hi — y) —h@ — yo) JF (y) dy. 
Taking the absolute value of both sides and integrating over R — /*, we obtain 


Sear FEAF Ox < fram fryer Ii — y) —h@& — yoIIF OI dy dx 
= fer [ Segre Wee — y) — AO — yo) dx] IF ODI dy 
<6 fre |FO dy, 


the last step holding by (*). This proves (**). 

Let the L! function f be given. Fix € > 0. We shall decompose the L! 
function f into the sum f = g +b ofa“good” function g and a “bad” function D, 
in a manner dependent on &. The good function will be in L® and hence will be 
in L'NL® C L?’; the effect of applying H; to it will be controlled by the bound 
of H, on L?. The bad function will be nonzero on a set of rather small measure, 
and we shall be able to control the effect of H; on it by means of (*). 

We begin by constructing a disjoint countable system of bounded open intervals 
I; such that 


G) So,mUk) < SII, /E; 
(ii) | f (x)| < € almost everywhere off LU, Ik, 


(iii) aTD J, |f | dy < 26 for each n. 


4The “double” of a bounded interval / is an interval of twice the length of J and the same center. 
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Namely, let f*(x) = sup, o + Sie—hxth) |f (t)| dt be the Hardy—Littlewood 
maximal function of f, and let E be the set where f*(x) > &. The set E is 
open. In fact, if f*(x) > &, then ail teieevad |f@|dt => & + € for some 
€ > 0. For 5 > 0, the inequality + lt say ey cogs |f()| dt => & + € shows that 
f*(x4+6) = ia (+e). Hence f*(x+6) > & for 6 sufficiently small. Similarly 
t*( — 4) > & for 6 sufficiently small. 

Since EF is open, E is uniquely the disjoint union of countably many open 
intervals, and these intervals will be the sets J;. The disjointness of the /;,’s and 
the Hardy—Littlewood Maximal Theorem (Theorem 6.38) together give 


Som) < mE) < Sif IL /E: 
k 


and this proves (i) and the boundedness of the intervals. The a.e. differentiability 
of integrals (Corollary 6.39) shows that | f(x)| < f*(x) ae., and therefore 
|f(x)| < € ae. off E = U, Ik. This proves (ii). If J = (a, b) is one of the X’s, 
then a is not in E,, and we must have Gow Sip—20—2) ,b1 [f@|dt < f*(@ <é. 
Therefore an Sa, b] | f(t)| dt < 2&. This proves (iii). 

With the open intervals J; in hand, we define the decomposition f = g +b by 


f(y)dy — forx € Ik, 


g(x) = mk) Ik 
f(x) forx ¢U; Ik. 
1 
—_ — d fe Ik, 
oe f() AD [ y orx € Ik 
0 forx ¢ Uj; Ik: 


Since {x | |Hi f(x)| > €} ¢ {x | |Aig(x)| > €/2} U {x | |Mib(x)| > €/2}, itis 
enough to prove 
e m({x | |Hig(x)| > €/2}) < C'I fll, /é and 
e m({x | |Hib(x)| > €/2}) < C'IFIL/E 
for some constant C’ independent of € and f. 
The definition of g shows that tie le(x)|dx < tie | f(x)| dx for all k and 


that |g(x)| = |f(x)| for x ¢ U, de; therefore fy |g(x)dx < folf()|dx. 
Also, properties (ii) and (iii) of the /;,’s show that |g(x)| < 2& a.e. These two 
inequalities, together with the bound ||; g||, < Allg|l,, give 


Jol Mig@)P dx < A? fy lg) dx < 2€A2 fy lex) dx < 2642 fy [f | dx. 
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Combining this result with Chebyshev’s inequality m({x | |F(x)| > B}) < 
pr? ie | F (x)|? dx for the function F = Hg and the number 6 = &/2, we obtain 


4 
<p 
This proves the bulleted item about g. 

For b, let by be the product of b with the indicator function of /,. Then we 
have b = )°, by with the sum convergent in L!. Since H, is convolution by the 
L? function h,, Hb = >=, Hide with the sum convergent in L?, Lumping terms 
via Theorem 5.59 if necessary, we may assume that the convergence takes place 
a.e. Therefore |H,b(x)| < >, |Hib,(x)| a.e. Using monotone convergence and 
(*«), we conclude that 


WPM (eu, ig) = d WavPelha @u, 1) 


<0 Abellne—ry < Pell, = Ol, < Oll fly. 
k 


8A7 lI filly 


m({x | |Mig(x)| > &/2}) £ 


26a? lf @)ldx = 
R 


Thus m({x ¢eR—U, Jf | |Aib@)| > €/2}) < 6 fil, /E/2) = 121 FI /E- 


Since m({x € U, Tf}) < Sif ll, /& by @, we obtain m({x | |H,b(x)| > &/2}) < 
17 fll, / &, and the bulleted item about b follows. 


8. Problems 


1. For each of the following alternative definitions of the Fourier transform in R¥ , 
find a constant a such that the Fourier inversion formula is as indicated, and find 
a constant 6 such that when convolution is defined by 


f* 8X) =B fan fe —NgW)dt, 


then the Fourier transform of the convolution is the product of the Fourier 
transforms. 


(a) Fourier transform fo) = tee f (x)e~*Y dy and inverse Fourier transform 

FR) =a fan Fine!” dy. 

Fourier transform f(y) = (27)~% foes f (x)e~* dy and inverse Fourier 

transform f (x) =a fow fel dy. 

(c) Fourier transform fy) = (27)~N/2 hee f (x)e~*Y dy and inverse Fourier 
transform f(x) = @ fon foei*? dy. 


(b 


wm 
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2. Let Gu, v)2 = Vea u(x)u(x) dx if u and v are in L?(R%), and let F denote the 
Fourier transform on L?(R). Prove for every pair of functions f and g in L? 
that (f, g)2 = (F(S), F(g))2- 


3. Prove that the Poisson kernel P and the conjugate Poisson kernel Q for Ry 
satisfy the identity QO, * Og = Peter. 


4. This problem is an analog for the Fourier transform of Problem 20c of Chapter VI 
concerning Fourier series and weak-star convergence. Weak-star convergence 
was defined in Section V.9. 

(a) If f is in L©(R) and P, is the Poisson kernel, prove that P, * f converges 
to f weak-star against L'(R™) as t decreases to 0. In other words, prove 
that lim, jo fon (Pr * f)(x)g(x) dx = fon f(x)g(x) dx for every g in L!. 

(b) Theorem 8.19b shows that ||P; * fll,, < Il fll. if f is in L®(R"). Prove 
that lim,jo Il P: * floc = IF loo: 

5. Let M*(RY) be the space of finite Borel measures on R%. This problem 
introduces convolution and the Poisson integral formula for M*+(R). Each 
finite Borel measure on R% defines, by means of integration, a bounded linear 
functional on the normed linear space Ccom(R”) equipped with the supremum 
norm, and thus it is meaningful to speak of weak-star convergence of such 
measures against Coom (RY). 

(a) The convolution of a finite Borel measure «4 on R% with an integrable 
function f is defined by (f * w)(x) = jee f(@ — y)du(y). Define 
the convolution 42 = j41 * 42 of two members of M*(R*) by w(E) = 
ter [4 (E — x) d{t2(x) for all Borel sets FE’. Check that the result is a Borel 
measure and that the definition for f dx * wu, 1.e., for the situation in which 
[1 and {22 are specialized so that uw; = f dx and w2 = p, is consistent with 
the definition in the special case. 

(b) With convolution of finite Borel measures on R™ defined as in (a), prove 
that fev gd(M1* 2) = few few 8X +y) dui (x) du2(y) for every bounded 
Borel function g on RY. 

(c) Verify that || P,* wll, < L(R™) if wis in. M+ (RY). Prove the limit formula 
lim, yo ||P; * HI], = #(R™). 

(d) If wis in M+(R), prove that the measures (P, * 4)(x) dx converge to 
weak-star against Ccom(R™) as t decreases to 0. In other words, prove that 
lim, jo few (Pr * MW) (x) g(x) dx = fow g(x) d(x) for every g in Coom(R™). 


Problems 6-12 examine the Fourier transform of a measure in M+ (R"), ultimately 
proving “Bochner’s theorem” characterizing the “positive definite functions” on RY. 
They take for granted the Helly-Bray Theorem, i.e., the statement that if {,} 
is a sequence in M* (RY) with jz, (R%) bounded, then there is a subsequence {/n,} 
convergent to some member pu of M+ (R”) weak-star against Ceom(R”). The Helly— 
Bray Theorem will be proved in something like this form in Chapter XI. 
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If w is in M*+(R%), the Fourier transform of i is defined to be the function 

BO) = Jen oY due). 

(a) Prove that 7 is bounded and continuous. 

(b) Prove that the Fourier transform of the delta measure at 0 does not vanish at 
infinity. 

(c) Prove that ji; * £2 = f4j/42 when convolution of finite measures is defined 
as in Problem 5. 

(d) By forming 9, * 2, prove that 7 can equal 0 for some jz in M* (RY) only 
ifu=0. 

A continuous function F : RY — C is called positive definite if for each 

finite set of points x,,...,x, in R% and corresponding system of complex 

numbers &,,..., &, the inequality par F(x; - x): E; > 0 holds. Prove that 

the continuous function F' is positive definite if and only if the inequality 

Jen Jan F (x — y)g(x)g(y) dx dy = 0 holds for each member g of Ceom(R¥). 

Prove that the Fourier transform of any member 4p of M*(R%) is a positive 

definite function. 

Using sets of one and then two elements x; in the definition of positive definite, 

prove that a positive definite function F must have F (0) > 0 and |F(x)| < F() 

for all x. 

Suppose that F is positive definite, that g > 0 is in L!(R¥), and that ®(x) = 

Sen €2"'* p(y) dy. Prove that F (x) ® (x) is positive definite. 

Suppose that F is positive definite. Let ¢ > 0, and let g be as in Problem 10, so 

that p(x) = e~New™* FP and @(x) = eZ PN, 

(a) The function Fo(x) = F(x) ®(x) is positive definite by Problem 10. Prove 
that it is integrable. 

(b) Using Problem 2 and the alternative definition of positive definite in Prob- 
lem 7, prove that Saw Fo(y) l@(y)|? dy = 0 for every function g in Ceom(R). 

(c) Deduce from (b) that the function fo = Fo is > 0. 

(d) Conclude from (c) that fo is integrable with tar fody = F(O), hence that 
fo’) dy is a finite Borel measure. 

(Bochner’s Theorem) By combining the results of the previous problems with 


the Helly-Bray Theorem, prove that each positive definite function on R% is the 
Fourier transform of a finite Borel measure. 


Problems 13-18 concern a version of the Fourier transform for finite abelian groups, 
along with the Poisson Summation Formula in that setting. They show for a cyclic 
group of order m = pq that the use of the idea behind the Poisson Summation 
Formula makes it possible to compute a Fourier transform in about pq(p + q) steps 
rather than the expected m* = p*q* steps. This savings may be iterated in the case 
of a cyclic group of order 2” so that the Fourier transform is computed in about 2” 
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steps rather than the expected 27” steps. An organized algorithm to implement this 
method of computation is known as the fast Fourier transform. 


13. 


14. 


15. 


16. 


Let G be a finite abelian group. A multiplicative character x of G is a homo- 
morphism of G into the circle group {e'”}. If f and g are two complex-valued 
functions on G, their L* inner product is defined to be eer. (t)g(t). 

(a) Prove that the set of multiplicative characters of G forms an abelian group 
under pointwise multiplication, the identity element being the constant func- 
tion 1 and the inverse of x being x. This group G is called the dual group 
of G. 

(b) Prove that distinct multiplicative characters are orthogonal and hence that 
the members of G form a linearly independent set. 

(c) Let J, be the cyclic group {0, 1,2, ...,m—1} of integers modulo m under 
addition, and let ¢,, = e27i/m Forkin Jn define a multiplicative character x, 
of Jn by Xn, (kK) = (e% Prove that the resulting m multiplicative characters 
exhaust A and that XnXv = Xnin'. Therefore mm is isomorphic to J,,. For 
Problems 16-18 below, it will be convenient to identify x, with x,(1) = ¢/". 

(d) If G isa direct sum of cyclic groups of orders m1, ..., m,, use (c) to exhibit 
ITj- , m; distinct members of G. Using (b) and the theorem that every finite 
abelian group is the direct sum of cyclic groups, conclude for any finite 
abelian group G that these members of G exhaust G and form a basis of 
L?(G). 


Let G be a finite abelian group, and let G be its dual group. The Fourier 
transform of a function f in L?(G) is the function f on G given by f(x) = 
Leo lO xQ. Prove that the Fourier transform mapping carries L?(G) one- 


one onto L? (G) and that the correct analog of the Fourier inversion formula is 
f(t) =|G|"! Pogee FOOx(), where |G| is the order of G. 


Let G be a finite abelian group, let H be a subgroup, and let G/H be the quotient 
group. If t is in G, write ¢ for the coset of t in G/H. Let f be in L?(G) and define 
F(t) = nen f(t +h) as a function on G/H. Suppose that x is a member of 


G that is identically 1 on H, so that x descends to a member x of G/H . By 
imitating steps in the proof of Theorem 8.15, prove that f(x) = F(x). 


Now suppose that G = J,, with m = pq; here p and g need not be relatively 
prime. Let H = {0,q,2q,..., (p—1)q} be the subgroup of G isomorphic to Jp, 
so that G/H = {0,1,2,...,q—1} is isomorphic to oe Prove that the characters 
x of G identified as in Problem 13c with 6°, Gf, Gn? ,..., c4—)? are the ones 
that are identically 1 on H and therefore descend to characters of G/H. Verify 
that the descended characters x are the ones identified with ¢°, ae Cots ete a 

Consequently the formula fl { y= F ( x) of the previous problem oropias a way 
of computing f at ison oh ta? sates Ee )P from the values of F. Show that if 
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F is computed from the definition of Fourier transform in Problem 14, then the 
number of steps involved in its computation is about q”, apart from a_ constant 
factor. Show therefore that the total number of steps in computing f at these 
special values of x is therefore on the order of g* + pq. 


In the previous problem show for each k with 0 < k < p—1 that the value of fat 
a, eee ae ee (a-Dp ae can be handled in the same way with a different 
F by replacing f by a suitable variant of f. Doing so for each k requires p times 
the number of steps detected in the previous problem, and therefore all of f can 


be computed in about p(q” + pq) = pq(p + q) steps. 
Show how iteration of this process to compute the Fourier transform of each F’,, 


together with further iteration of this process, allows one to compute a Fourier 
transform for Ji), m5..m, in about mym---m,(m, +m2+---+m,) steps. 


CHAPTER IX 


L? Spaces 


Abstract. This chapter extends the theory of the spaces L!, L?, and L® to include a whole family 
of spaces L?, 1 < p < oo, in order to be able to capture finer quantitative facts about the size of 
measurable functions and the effect of linear operators on such functions. 

Sections 1-2 give the basics about L?. For general measure spaces these consist of Hélder’s 
inequality, Minkowski’s inequality, a completeness theorem, and related results. For Euclidean 
space they include also facts about convolution. 

Sections 3-4 develop some tools that at first may seem quite unrelated to L? spaces but play 
a significant role in Section 5. These are the Radon—Nikodym Theorem and two decomposition 
theorems for additive set functions. The Radon—Nikodym Theorem gives a sufficient condition for 
writing a measure as a function times another measure. 

Section 5 identifies the space of continuous linear functionals on L? for 1 < p < oo when 
the underlying measure is o-finite. For one thing this identification makes Alaoglu’s Theorem in 
Chapter V concrete enough so as to be quite useful. 

Section 6 discusses the Marcinkiewicz Interpolation Theorem, which allows one to reinterpret 
suitably bounded operators between two pairs of L? spaces as bounded between intermediate pairs 
of L? spaces as well. The theorem has immediate corollaries for the Hardy—Littlewood maximal 
function and an approximation to the Hilbert transform, and Section 6 goes on to use each of these 
corollaries to derive interesting consequences. 


1. Inequalities and Completeness 


In the context of any measure space, we introduced in Section V.9 the spaces L!, 
L?, and L®. Since then, we have used these three spaces to capture quantitative 
facts about the size of measurable functions. The construction in each case 
involved introducing a certain pseudonorm in a vector space of functions, thereby 
making the vector space into a pseudo normed linear space and in particular a 
pseudometric space. The corresponding metric space obtained from the construc- 
tion of Proposition 2.12 was L!, L?, or L© in the respective cases. For each of 
the three, the vector-space structure for the pseudometric space yielded a vector- 
space structure for the metric space, and L', L*, and L® were normed linear 
spaces. As was true in Chapters V and VI, it continues in the present chapter 
to be largely a matter of indifference whether the functions in question are real 
valued or complex valued, hence whether the scalars for these vector spaces are 
real or complex. 
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Now we shall enlarge the family consisting of L', L*, L® to a family L? for 
1 < p < o in order to be able to capture finer quantitative facts about the size 
of measurable functions. Enlarging the family in this way makes it possible to 
get better insight into the behavior of specific operators and to make more helpful 
estimates with partial differential equations. 

Let (X, A, 2) be a measure space. We have already dealt with p = oo. For 
1 < p < ©&,weconsider the set V = V, of measurable functions f on X such that 
5 y |f|? dy is finite. This integral is well defined; in fact, f measurable implies 
| f| measurable, and also, for c > 0, (| f|?)~!(c, too) = | f |71(e!/?, +00). The 
set V is in fact a vector space of functions. It is certainly closed under scalar 
multiplication; let us see that it is closed under addition. If f and g are in V, then 
we have 


(Ff (x)| + lg@)))? < (max{| f(x)|, 1¢@)]} + max{| f (|. 1¢@)1})” 
= 2? max{| f(x)|”, |g(x)|?} < 271 f |? + 2? |g (x)? 
for every x in X. Integrating over X, we see that f + g isin V if f and g are 
inV. 

Following the construction of the prototypes L! and L? in Section V.9, we 
introduce the expression I fll, = Ci | f |? dy)'!” for f in V,. We would like 
| - ||, to be a pseudonorm in the sense of satisfying 

(i) IIxll, 2 0 forallx eV, 

(ii) Ilex, = |c| IIxll, for all scalars c and all x € V, 

(iii) |x PIs = IIxll, tole for all x and yin V. 


Properties (i) and (ii) are certainly satisfied, but a little argument is needed to 
verify (iii). We return to this matter ina moment. Once the function || - || pon the 
vector space V, is known to be a pseudonorm, V, meets the conditions of being 
a pseudo normed linear space in the sense of Section V.9. 

We can pass to the set of equivalence classes just as in that section, and this set 
is defined to be L? or L?(X) or L?(X, w). The equivalence class of 0 is again 
the set of all functions vanishing almost everywhere. The function || - ||, is well 
defined on L’?, and L? is a normed linear space. In particular, it has the structure 
of a metric space. This handles 1 < p < oo, and the space L™ was constructed 
in Section V.9. 

As is true with L!, L?, and L®, one sometimes relaxes the terminology and 
works with the members of L?(X) as if they were functions, saying, “Let the 
function f be in L?(X)” or “Let f be an L? function.” There is little possibility 
of ambiguity in using such expressions. 

Let us return to property (iii) above. This will be proved as Minkowski’s 
inequality below. But first we prove a numerical lemma and then “Hélder’s 
inequality,” which is a version for L? of the Schwarz inequality for L?. Hélder’s 
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1 1 
inequality makes use of the dual index p’ to p, defined by the equality -+— = 1. 
P Pp 


The dual index to | is oo, and vice versa. The index 2 is its own dual. 


Lemma 9.1. If s,t¢,a@, and # are real numbers > 0 witha + 6 = 1, then 
s*t? <as+ Bt. 
PRrooF. If any of s, t, a, B is 0, the result is certainly true. If all are nonzero, 
consider the function 
f(x) = ax"! + (1 —a)x®, 


defined for x > 0. The derivative f’(x) = (1 — ajax®-?(x — 1) is < 0 for 
0<x <1,is=0Oforx =1,and is > 0 forx > 1. Therefore f(x) assumes its 
absolute minimum value for x = 1. Since f(1) = 1, we have 


1 <ax®! + (1 —a@)x® = ax? + Bx% 
for all x > 0. The lemma follows by putting x = t/s in this inequality and by 
multiplying both sides by s%1?. 
REMARK. Alternatively, this lemma can be proved by Lagrange multipliers in 


the same way that Problem 20 at the end of Chapter III suggested using for the 
arithmetic-geometric mean inequality. 


Theorem 9.2 (Holder’s inequality). Let (X,.A, 4) be any measure space, let 
1 < p < cw, and let p’ be the dual index to p. If f isin L? and g is in L”’, then 
fg isin L', and 
lLfglly < If llplgll,- 


REMARK. The inequality holds trivially if || f || p = +00 or lg ll yp! = +00. 


PROOF. We already know the result if p = 1 and p’ = oo or the other way 
around. Thus suppose that p > 1 and p’ > 1. We may assume that neither f 
nor g is 0 almost everywhere. Then we can apply Lemma 9.1 with a = p7! 


B= pi 


> 


IF x)I? Is@l? 
= , and t= ; ; 
Sy lf? du Sy lel? du 

getting 

IFOSCO] — FCO Ie QO? 

Iflpligi, Psylfl?du  p' fy lel? du 
Integrating, we obtain 

Pe PBVANE oN go on 


/ 


= T 
lfl,lgly PP 


and the conclusions of the theorem follow. 
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Theorem 9.3 (Minkowski’s inequality). Let (X, A, 2) be any measure space, 
and let 1 < p < ow. If f and g are in L’, then f + g isin L? and 


If +gllp <Ifll, +llsll,- 


REMARK. The theorem assumes the usual convention that f + g is made to 
be 0 at any point x where f(x) + g(x) is not defined. The set where this change 
occurs is of measure 0 since f and g have to be finite almost everywhere to be in 
LP, 


PROOF. We have already seen that f + g is in L?, and we know the inequality 
for p = 1 and p = o& from Section V.9. For 1 < p < o, let p’ be the dual 
index. We apply Holder’s inequality (Theorem 9.2) to f and | f + g|?~! and to 
gand|f + g|?~! to obtain 


Sy lf telPdus fy lf tellf tel? ‘du 
<fylfllf +l tdut fy lgllf + gl? du 


< If Ip( Sf tale? dp) +1 ell,( Sf tele?” du)” 
=(fylf tel? du)” (fll, + llgll,)s 


the last step holding because (p — 1)p’ = p. If || f + 8|l, = 9, the inequality of 
the theorem is certainly true. Otherwise the inequality of the theorem follows 
after dividing the inequality of the display by (fy |f + gl?d yu)! ? which we 
know to be finite, and using the fact that 1 — 7 = a 


Thus LZ? is a normed linear space for 1 < p < oo. Let us derive some of its 
properties. 


Proposition 9.4. Let (X, A, 1) be a measure space, and let 1 < p < oo. Then 
every indicator function of a set of finite measure is in L?(X), and the smallest 
closed subspace of L? (X) containing all such indicator functions is L? (X) itself. 
Consequently 


(a) the set of simple functions built from sets of finite measure lies in every 
L?(X) for 1 < p < cand is dense in L?(X) if 1 < p< ~, 
(b) 1 < p, < p < po < wand p < ow togetherimply that L?'(X)NL”?(X) 
is dense in L?(X). 
In addition, 
(c) 1 < pi < p < po < & implies that L?(X) C L?'(X) + LP?(X). 
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PROOF. The conclusion in the second sentence of the proposition is proved by 
the same argument as for Proposition 5.56. Part (a) then follows from Proposition 
5.55d. Part (b) follows by combining these two results once it is known that 
LP\(X)NLP2(X) C L?(X). For this inclusion let f be in L?!(X)N L?2(X). We 
may assume that p < oo. If po < ow, then 

es If|P du = Sure lf |? dut+ Saris If |? du 


< fu pey FIP de + Sy pen |Fl?! du < +00, 


and hence f is in L?(X). If p2 = o0, then {|| > 1} has finite measure since f 
is in L?! and p, < oo. Thus 


iy fl? du — Surien |f\? du +r Suns fl? du 
<FIK MAF] > DD + fy pity LAI?! de < +00, 


and again f isin L?(X). This completes the proof of (b). 
For (c), let f be in L?, and write f = f; + fo, where 


Ff) il f@)|>1 fx) iflf@Is ‘| 


0 otherwise 0 otherwise 


[inna =| fans | LflP du <0o 
x {| fl>1} {| fl>1} 


shows that f; is in L?'(X). It is apparent that f> is in L°(X), and thus f5 is 
certainly in L??(X) if py = oo. If po < oo, then 


/ alrau = [ dus | If|? du < oo 
x (iflsD (iflsD 


shows that fo is in L??(X). This proves (c). 


fin ={ and fis) ={ 


Then 


H6lder’s inequality allows us to prove the following supplement to the con- 
clusions of Proposition 94. 


Proposition 9.5. Let (X, A, 2) be any measure space. Let 1 < pj < p < po, 
and define t with 0 < rt < 1 by . = = + = Then 


1— 
fll, < WIS ISI. 


PROOF. First suppose that py < oo. Since ‘ > ae we can find b with 
1=t Tf b’ denotes the dual index, then - = 


14 Et 
1 < b < +o such that 5, = S rae 
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_ mst : et = Vfl 
oe mi i. a Define ae ae equation ab = p,. Then (p — a)b! = 
Pi\ P2 _ Pi) -t) _ 
(p— B)2 = pol; — &) = Pal; — 7“) = 2. 
We write | f |? = | f|*|f|? *%. Application of Hélder’s inequality with index b 


and dual index b’ gives [| f|? du < (f [fleedp)?(f Wa send du)", and 


hence | 
If ll, < (f [Flay (f [Flee ayy”. 


We have seen thatab = p,,1/(bp) = (1—1)/p1, (p—a)b’ = p2,and 1/(b'p) = 
t/p2. Thus the inequality reads || ||, < FM WS I, and the proof is complete 
when p2 < oo. 

When p2 = ©, we write |f|? = |f|?'|f|?-?'. Replacing | f|?~?' by its 
essential supremum gives [| f|? du < \|fIIS?' {| f|?! du and hence || f |, is 


<( fifi dy)? pI ePr/? =( Pip dpe pce? = FS MF loo: 


This completes the proof when pz = oo. 


We have already made serious use of the completeness of L? for p equal to 1, 
2, and oo as proved in Theorem 5.58. As might be expected, this result extends 
to be valid for the other values of p. 


Theorem 9.6. Let (X,.A, 4) be any measure space, and let 1 < p < ~w. 
Any Cauchy sequence { f;} in L? has a subsequence { fi, } such that || fx, — fen ll p 
< Crintm,ny With }°,C, < +00. A subsequence {fj} with this property is 
necessarily Cauchy pointwise almost everywhere. If f denotes the almost- 
everywhere limit of {f,,}, then the original sequence {f;} converges to f in 
L?., Consequently the space L?, when regarded as a metric space, is complete in 
the sense that every Cauchy sequence converges. 


REMARK. As in the case with p equal to 1, 2, and ov, the detail is important. 
The detailed statement of the theorem allows us to conclude, among other things, 
that if a sequence of functions is convergent in L?! and in L??, then the limit 
functions in the two spaces are equal almost everywhere. 


PROOF. We may assume that p < oo, the case p = oo having been handled 
in Theorem 5.58. The argument for 1 < p < oo is word-for-word the same as in 
the proof for p = 1 and p = 2 of Theorem 5.58. 


In Section V.9 the inequality || f + gll, < Ilfll, + llgll, for p equal to 1, 2, 
or oo says in words that “the norm of a sum is < the sum of the norms.” In that 
section we obtained a generalization for those values of p, saying that “the norm 
of an integral is < the integral of the norms.” The generalization continues to be 
valid for the other p’s under study; the proof amounts to a direct derivation from 
H6lder’s inequality. 
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Theorem 9.7 (Minkowski’s inequality for integrals). Let (X,.A,m) and 
(Y, B, v) be o-finite measure spaces, and let 1 < p < oo. If f is measurable on 
X x Y,then 


| i, fe.y)du@|< [ IFC. Dp.avoy 4e@) 


in the following sense: The integrand on the right side is measurable. If the 
integral on the right is finite, then for almost every y [dv] the integral on the left 
is defined; when it is redefined to be 0 for the exceptional y’s, then the formula 
holds. 


PROOF. Theorem 5.60 handles p = 1 and p = oo, and we may assume that 
1 < p < w. The measurability question is handled for 1 < p < oo in the same 
way as in Theorem 5.60 for p = 2. In proving the inequality, we may assume 
without loss of generality that f > 0. The generalization of the computation in 
the proof of Theorem 9.3 makes use of Fubini’s Theorem and proceeds as follows: 


Syl Sy fy) du(a)|’ dvi) 
= (Gy au | EF. y d(x)?! eG) 
= Sy (Sy FO Y) | Se FO yd)?! dv} dur) 
< fy (SIF. I? dv)” 
x Url fu fe ute [™ any” ae 
= (fellif Gs Wlp,avey) du(x)) (fy | fy fQ.Y) du(x')|” dv(y)) 


The next-to-last step uses Hdlder’s inequality (Theorem 9.2), and the last step 
uses the fact that (p — 1)p’ = p. 
In order to complete the proof, we need to be able to divide by the factor 


CE | IGSy) du(x’)|? dv(y))"”, There is no problem with the theorem if 
this factor is 0, since then the left side of the inequality of the theorem is 0. A 
problem occurs if this factor is infinite. Instead of trying to prove directly that this 
factor is finite (and hence the division is allowable), let us retreat to the special 
case that f is bounded and is equal to 0 off an abstract rectangle of finite w x v 
measure. Then the factor in question is certainly finite, the division is allowable, 
and we obtain the inequality of the theorem. To handle general measurable f > 0, 
we do not attempt to justify this division. Instead, we observe that the validity 
of the inequality in the theorem when f is bounded and is equal to 0 off a set of 
finite 4 x v measure implies the validity of the inequality in general, by a routine 
application of monotone convergence. This completes the proof. 
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The last basic fact about L” spaces is the identification of continuous linear 
functionals on L”, at least when p is finite. Deriving the necessary tools for this 
analysis will require a digression, and we shall return to this topic in Section 5. 
Meanwhile, we can easily obtain one part of the identification of continuous linear 
functionals, as in Proposition 9.8 below. It amounts to a combination of Hélder’s 
inequality and a converse, and it gives a way of computing L? norms by starting 
with computations that are linear. 


Proposition 9.8. Let (X, A, 4) be any measure space, let 1 < p < oo, and 
let p’ be the dual index. If p < oo, then 


Iflly = sup | [ feanl, 
i 4 


gel’, 
Igllpst 


and this equality remains valid for p = ow if y is o-finite. 


REMARK. The equality can fail when p = oo and yp is not o-finite. Problem 4 
at the end of the chapter gives an example. 


PROOF. With 1 < p < ~w,ifgisin L? with Ilg ll,” <1, then Hoélder’s inequality 
gives |{ fedu| < fi fgldu <Ifllplgll, < Ifill). Taking the supremum over 
g with ||g||,, < 1 shows that sup, | { fgdu| < fll, 


For the reverse inequality we may assume that || f'||,, # 0. First suppose that 
1 < p < ~. Define g(x) by 


say =| fly? ? FOOIf@lr-? if FO) £0, 
0 if f(x) =0. 


Then flg@)l" du = [fp PP fF @NO-PP du =f lly? [IFO du = 
1. For this g, we have | f fgdu| = If llp?? ffl? du = llf\l,- Thus the 
supremum over the relevant g’s of | f fg du is > lf llp- 

Next suppose that p = 1. If we define g(x) to be f(x)/|f(x)| when f(x) 40 
and to be 0 when f(x) = 0, then |lg||,, = 1and|f fgdu| = fi fi?/lfldu = 
\| f||,,and the supremum over g of | f fedul is> || fll,- 

Finally suppose that p = oo. Let € > 0 be given withe < || /'||,,, and let E be 
the set where | f(x)| > Il fll, — €. Since yu is o-finite, there must exist a subset 
of E with nonzero finite measure. If F is such a subset and if g(x) is defined to be 
u(F)~! f(x)/|f (x)| when x is in F and to be 0 when x is in F°, then |g], = 1 
and | f, fgdu| = w(F)'f,\f|du = lif ll — €-. Thus the supremum over 
g of | Sy fg du| is > || f\l, — €. Since € is arbitrary, the supremum over g of 


fy fg dul is = Wf lloo- 
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In this section we collect results about L? spaces that extend facts proved about 
L', L?, and L® in the first three sections of Chapter VI. 


Proposition 9.9. If ju is a Borel measure on a nonempty open set V in R% and 
if 1 < p < ow, then 
(a) Ccoom(V) is dense in L?(V, 2), 
(b) the smallest closed subspace of L?(V, jz) containing all indicator func- 
tions of compact subsets of V is L?(V, jz) itself, 
(c) L?(V, i) is separable. 


PROOF. Parts (a) and (b) are proved from Lemma 6.22c, the regularity of 
jt (Theorem 6.25), Proposition 9.4, and Proposition 5.56 by the same kind of 
argument as for Corollary 6.4. Part (c) is obtained as a consequence in the same 
way that Corollary 6.27d follows from the other parts of that corollary. 


The remaining results in this section concern Lebesgue measure in R, and 
the L? spaces are understood to be L?(R‘, {Borel sets}, dx). 


Proposition 9.10. Let 1 < p < cv, and let p’ be the dual index. Convolution 
is defined in the following additional cases beyond those listed in Proposition 
6.14, and the indicated inequalities hold: 

(e) for f inL'(R™, dx) and gin L?(R™, dx),and then || f*g||,<IIf lll). 
for f inL?(R™, dx) and gin L'(R™, dx),and then || f*g||, <I fll pllgll,. 


(f) for f in L?(R%,dx) and g in L? (R%, dx), and then ||f * gil, < 
If llplgll,- 
for f in L? (R,dx) and g in L?(R%, dx), and then ||f * gil, < 
IF lls ll,- 


PROOF. The two conclusions in (e) follow from Minkowski’s inequality for 
integrals (Theorem 9.7) in the same way that the special case of p = 2 was 
proved in Proposition 6.14 from Theorem 5.60. The two conclusions in (f) 
follow from Holder’s inequality (Theorem 9.2) in the same way that the special 
case p = p’ = 2 was proved in Proposition 6.14 from the Schwarz inequality. 


Proposition 9.11. If 1 < p < oo, then translation of a function is continuous 
in the translation parameter in L? (R% , dx). Inother words, if f isin L?(R%, dx), 
then limy-,o || t+, f — Trllp = 0 for all f. 


PROOF. This follows from the denseness of Ceom(R*) in L? (R¥ , dx) (Propo- 
sition 9.9a) and is proved in the same way that Proposition 6.16 is derived from 
Corollary 64a. 
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Proposition 9.12. Let 1 < p < oo, and let p’ be the dual index. Then the con- 
volution of an L? function with an L”’ function results in an everywhere-defined 
bounded uniformly continuous function, not just an L° function. Moreover, 


If * 8lleup < lf llpllglly- 


PROOF. This extends Proposition 6.18 and is derived for 1 < p < oo from 
Propositions 9.10 and 9.11 in the same way that Proposition 6.18 is derived for 
p = 2 from Propositions 6.14 and 6.16. 


Theorem 9.13. Let y be in L!(R™, dx), define 
Q(x) = e \o(e7!x) fore > 0, 
and put c = fy g(x) dx. If f isin L?(R, dx) with 1 < p < 00, then 
li i — =0. 
an lve x f —cf ll, 


PROOF. This is derived from Minkowski’s inequality for integrals (Theorem 
9.7) and the continuity of translation in L? (Proposition 9.11) in the same way 
that Theorem 6.20a is derived for p = 2 from Theorem 5.60 and Proposition 
6.16. 


3. Jordan and Hahn Decompositions 


Now we digress before returning in Section 5 to the subject of continuous linear 
functionals on L? spaces. The subject of the present section is decompositions of 
additive and completely additive real-valued set functions into positive and nega- 
tive parts. This material will be applied in Section 4 to obtain the Radon—Nikodym 
Theorem, an abstract generalization of some consequences of Lebesgue’s theory 
of differentiation of integrals. In turn, we shall use the Radon—Nikodym Theorem 
in Section 5 to address the subject of continuous linear functionals on L? spaces. 

A real-valued additive set function v on an algebra of sets is said to be bounded 
if |v(E)| < C for all E in the algebra. A real-valued completely additive set 
function on a o-algebra of sets is said to be a signed measure. 


Theorem 9.14 (Jordan decomposition). Let v be a bounded additive set 
function on an algebra A of sets, and define set functions vt and v~ on A 
by 

vt(E) = sup v(F) and v-(E) =— inf v(F). 
FCE, 


FCE, © 
FeA FeA 
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Then vt and v~ are nonnegative bounded additive set functions on A such that 
v = vt —v_. They are completely additive if v is completely additive. In any 
event, the decomposition v = vt — v~ is minimal in the sense that an equality 
v = «+ —~ inwhich w+ and ww are nonnegative bounded additive set functions 
must have vt < wt andv7 <p. 


PROOF. First let us see that v* is additive always. In fact, let FE, and E, be 
disjoint members of A. If F C E,; U En, then the additivity of v implies that 
V(F) = v(FN E,}) + v(F 9 Ep) < vt(E;) + vt (22). Hence 


vt(E, U Eo) < vt (E1) + vt (Ed). 


On the other hand, if F} C FE, and Fy C E>, then v(F}) + v(Fo) = v(F, UF) < 
v+(E, U Ed). Taking the supremum over F; and then over F> gives 


vt (E1) + v* (Eo) < vt (Ey U Ep). 


Thus vt is additive. 

Second let us see that v* is completely additive if v is completely additive. 
Let E,, be a disjoint sequence of sets in .A whose union E is in A. If F C E, then 
the complete additivity of v implies that v(F) = }°, v(F MN En) < Yo, vt (En). 
Hence v*(E) < }°™, vt (E,,). On the other hand, the fact that vt is nonnegative 
additive implies for every N that ys vt (E,) = vt (E, U---U En) < vt (EB). 
Thus )°°°, v+(E,) < vt (E). Therefore v* is completely additive. 

Third let us see that v = vt — v-. This equality will imply also that v—~ 
is additive and that v~ is completely additive if v is completely additive. Form 
v(E)+v7(E) = v(E)+suprcg{—v(F)}; we are to show that this equals v*(E). 
For any F C E, we have v(E) +(-v(F)) = v(E — F) < vt(E). Taking the 
supremum over F gives v(E) + v-(E) < vt(E). In the reverse direction, 
F C E implies that v(F) = v(E) — v(E — F) < v(E) + supger{—v(G)} = 
v(E) + v-(E). Taking the supremum over F gives v*(E) < v(E) + v7 (E). 
This proves the decomposition v = vt — v-. 

Finally we prove the minimality of the decomposition. Let v = wt — w~ 
with + and w~ nonnegative additive. If F C E, then we can write v(F) = 
ut(F) — w(F) < wt(F) < wt(E£). Taking the supremum over F gives 
vt(E) < ut (EB). Similarly v~ < po. 


Theorem 9.15 (Hahn decomposition). If v is a bounded signed measure on a 
o-algebra A of subsets of X, then there exist disjoint measurable sets P and N 
in A with X = P UN such that v(E£) > 0 for all sets E C P and v(E£) < 0 for 
all sets E CN. 
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PROOF. Write v = vt — v~ as in Theorem 9.14. If € > 0 is given, choose A 
in A with v(A) > v*(X) —e. Then 
v- (A) = vt (A) — (A) < vt (A) — vt (X) +6 <e 
and vt (AS) = vt (X) — vt (A) < v(A) +e — vt(A) <e. 
By taking Po = A and No = A‘, we see that for any € > 0 we can write 
X = Po U Np disjointly with vt (No) < € and v7 (Po) < €. 


For n > 1, write X = P, U Ny, disjointly with vt (N,) < 27" and v~(P,) < 
2". Define 
P= Unt Minn Pim and N = Po = (Vn Upren Nn- 
These sets are in A since A is a o-algebra. Theorem 9.14 shows that v~ is 
completely additive, and hence v~(P) < 072, v7 ((Viren Pin). The right side 
is 0 since v~((\_, Pm) < v~ (Pate) < 27>* for all k > 0, and there- 


fore v-(P) = 0. In addition, every n has vt(N) < v*(Ur. v*(Nm)) < 


m=n 


yn Yt (Nn) < Yor, 27" = 277+! and therefore v+(N) = 0. 


m=n 


4. Radon-Nikodym Theorem 


The Lebesgue decomposition of Chapter VII says that any Stieltjes measure jz on 
the line decomposes as w(E) = ue pf ax + Ms with fry concentrated on a Borel 
set of Lebesgue measure 0. The function f is obtained in that chapter as the 
derivative almost everywhere of the distribution function of jz, hence as the limit 
of wU)/m(/) as intervals J shrink to a point; here m is Lebesgue measure. In 
this formulation of the result, the geometry of the line plays an essential role, and 
attempts to generalize to abstract settings the construction of f from limits of 
(1) /m(Z) have not been fruitful. 

Nevertheless, the Lebesgue decomposition itself turns out to be a general 
measure-theory theorem, valid for any two measures in place of yz and dx, as 
long as suitable finiteness conditions are satisfied. For a reinterpretation of the 
results of Chapter VII, the heart of the matter is that one can tell in advance which 
ws have w(E) = J, » J ax with the singular term jz; absent. The answer is given 
by the equivalent conditions of Proposition 7.11, which are taken in that chapter as 
a definition of “absolute continuity” of j with respect to dx. The remarkable fact 
is that those conditions continue to be equivalent when any two finite measures 
replace yz and dx. This is the content of the Radon—Nikodym Theorem, which 
we shall prove in this section, and then a version of the Lebesgue decomposition 
will follow as a consequence. 

Let X be a nonempty set, and let A be a o-algebra of subsets of X. If and v 
are measures defined on A, we say that v is absolutely continuous with respect 
to w, written v < uw, if v(E) = O whenever (E) = 0. 
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Theorem 9.16 (Radon—Nikodym Theorem). Let (X, A, j2) be a o-finite mea- 
sure space, and let v be a o-finite measure on A with v < yw. Then there exists a 
measurable f > 0 such that v(E) = ie f dw for all E in A, and f is unique up 
to a set of 2 measure 0. 


The Radon—Nikodym Theorem has two chief initial applications. One is to 
the identification of continuous linear functionals on L? for 1 < p < oo, and the 
other is to the construction of “conditional expectation” in probability theory. The 
application to L? will be given in Section 5, and the application to conditional 
expectation appears in Problems 23-26 at the end of the chapter. 

In both applications one needs a version of the theorem in which the completely 
additive set function v is complex-valued but not necessarily > 0. We take up 
this extension of the theorem later in this section. 

Most of the effort in the proof goes into showing existence when yz and v 
are both finite measures, as we shall see. In this setting we can quickly use the 
Hahn decomposition (Theorem 9.15) to get an idea how to construct f: Imagine 
that v(E) = he f dp for all E. Fix c and d, and let S be the set of x’s where 
c < f(x) <d. On any subset E of S, we then have cu(E) < v(E) < du(E). 
In other words, the bounded signed measure v — cy is > 0 on every subset of 
S, and the bounded signed measure v — dy is < 0 on every subset of S. Let 
X = P.UN, and X = Py U Ng be Hahn decompositions of v — cy and v — du 
with respect to yz. Then it is reasonable to expect S to be P.M Ng. In particular, c 
is a good lower bound for the values of f on S. It is easy to imagine that we can 
use this process repeatedly to obtain a monotone sequence of functions f, > 0 
tending to the desired function f. 

Actually, this argument can be pushed through, but handling the details is a 
good deal more complicated than one might at first suppose. The reason is that 
a Hahn decomposition is not necessarily unique. Sets of measure 0 account for 
the nonuniqueness, and the particular measures yielding these sets of measure 0 
are constantly changing. The complication is that one has to adjust all the Hahn 
decompositions to satisfy various compatibility conditions. We shall not pursue 
this idea because a simpler proof is available. 


PROOF OF UNIQUENESS IN THEOREM 9.16. Suppose that f and g are non- 
negative measurable functions with ie pid = f , & du for every measurable 
E. If F is a set where the equal integrals p f du and f p84 are finite, then 
(eae (f — g)du = 0 for every measurable subset EM F of F. If E is taken 
as the set where f > g, then Corollary 5.23 shows that f = gae.on EN F. 
Similarly f = g a.e.onthe set E°1 F, where f < g. Thus f = gae.onF. By 
o-finiteness of jz and v, we can write X = Bea X, disjointly with (X,,) and 
v(X,,) finite for all. Taking F equal to each X,, in turn, we see that f = g ae. 
on each X,,, and we conclude that f = ga.e.on X. 
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PROOF OF EXISTENCE IN THEOREM 9.16 WHEN [4 AND v ARE FINITE. Let F(v) 
be the set of all f > 0 in L!(X, w) such that (s f du < v(E) for all sets E in 
A. The zero function is in F(v), and thus it makes sense to define 


C = sup i fd. 
feF(vr) JX 
Let { f,} be a sequence in F(v) with lim, fe frdu=cC. 
Let us show that there is no loss of generality in assuming that the f, satisfy 
ti < fo <.---. To show this, it is enough to show that g and h in F(v) implies 
that max{g, h} is in F(v). We have 


J,max{g,h}du = Sente=ny gdut+ Sente<n hdu 
< VEN {g>h}) + v(EN{g <h}) =v(E), 


and hence max{g, h} is indeed in F(v). 
With the f,’s now increasing with n, put f(x) = lim, f(x). Monotone 
convergence shows that f is in F(v) and [ y f du = C. Define 


v(E) = v(E) -{ f du. 
E 


Then vo is a measure, 9 < ww, and the class F(vo) for vo consists of 0 alone. We 
shall complete this part of the proof by showing that vp = 0. 

If v9 4 O, choose n large enough so that vo(X) — 1 (X) > 0, and put 
Vg= io 1 yu. Let X = P UN bea Hahn decomposition for vj as in Theorem 
9.15, and define g = 1 Ip. Then the calculation 


Sette du = + W(P NE) =w(PNE)—vy(PNE) < w(P NE) < w(E) 


shows that g is in F(vp). Hence g = O ae. [dy], and w(P) = 0. Since vo < LL, 
we obtain vo(P) = 0 and therefore also 1)(P) = 0. Then vj < 0, and we must 
have vp(X) — 1 U(X) < 0. This contradicts the choice of n, and the proof of 
existence is complete when y and v are finite. 


PROOF OF EXISTENCE IN THEOREM 9.16 WHEN jt AND v ARE O-FINITE. Write 
X as the countable disjoint union of sets X,, such that w(X,,) and v(X,,) are both 
finite. If we put w,(£) = w(ENX,,) and v,(E) = v(ENX,,), then jy, and v, are 
finite measures such that v, < jt,, and the above special case produces functions 
fn = 0 such that v,(£) = de Sfndttn for all E. Since v,(X¢) = 0, we may 
assume that f,(x) = 0 for x ¢ X,. Let f > 0 be the measurable function that 
equals f, on X, for each n. Then our formula reads v(E MN X,) = tenx: fdu 


for all n and for all E. Summing on 7, we obtain v(E) = ee f dw for all E in A. 
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Corollary 9.17. Let (X, A, 1) be a finite measure space, and let v be a (real- 
valued) bounded signed measure on A with v < yw in the sense that u(E) = 0 
implies v(E) = 0. Then there exists a function f in L!(X, 2) such that v(E) = 
1 f du for all E in A, and f is unique up to a set of 2 measure 0. 


PRooF. Let v = vt —v7 be the Jordan decomposition of v as in Theorem 9.14, 
and let X = P UN bea Hahn decomposition of v as in Theorem 9.15. Suppose 
LE) = 0. Since yz is nonnegative, we obtain w(E ON P) = Oand u(ENN) =0, 
and the assumption v < yu forces 


0=v(ENP)=vt(EN P)=v'(£) 
and 0O=v(ENN)=—-v (ENN)=-v (E). 


Therefore vt < and v~ < yp, and the corollary follows by applying Theorem 
9.16 to vt and v~ separately. 


Corollary 9.18. Let (X,.A, 2) be a o-finite measure space, and let v be a 
o-finite measure on A. Then there exist a measurable f > 0 anda set S in A 
with (S$) = 0 such that v = f d+ vs, where v;(E) = v(E MS). The measure 
Vs is unique, and the function f is unique up to a set of jz measure 0. 


REMARK. The measure vs, being carried on a set of 4 measure 0, is said to be 
singular with respect to 4. The measure f djxis, of course, absolutely continuous 
with respect to 4. The decomposition of v into the sum of an absolutely continuous 
part and a singular part is called the Lebesgue decomposition of v with respect to 
jt. The corollary asserts that this decomposition of measures exists and is unique. 


PROOF. As in the proof of Theorem 9.16, we can reduce matters to the case 
that v and y are both finite, and it is therefore enough to handle this special 
case. Among all sets E in A with u(E) = 0, let C be the supremum of v(E). 
The number C is finite, being < v(X). Choose a sequence of sets E,, in A with 
L(E,) = Oand v(E,,) increasing to C. Without loss of generality, we may assume 
that FE; C Ey C---. Put S =|), E,. Proposition 5.2 shows that w(S) = 0 and 
v(S) = C. Define v,(E) = v(E N S°) and v,(E) = v(E NS). Then v, and vs 
are measures, and v = vz + vy. 

Certainly v, is singular with respect to yz, being carried on the set S of yu 
measure 0. Let us see that v, is absolutely continuous. Thus suppose that ~(E) = 
0. Then w(S U E) < w(S) + w(E) = 0, and the construction of C shows that 
v(SUE) < C = v(S). Therefore v(SUE) —v(S) < Oand v(SUE)—v(S) = 0. 
Hence 0 = v(SUE) — v(S) = v(E—S) = v(ENS*) = vg(E), and vg is indeed 
absolutely continuous. Applying the Radon—Nikodym Theorem (Theorem 9.16), 
we obtain v = vg + vs = f du + vs. This proves existence. 


424 Ix. L? Spaces 


For uniqueness, suppose that we have v = fdu+v, = f* du + v# with v, 
and vi carried on respective sets § and S* of 2 measure 0. The functions f and f* 
are integrable with respect to jz, and we have ,,(f — f*)du = vi(E) —v,(E). 
Taking E to be any subset T in A of SU S*, we see that 0 = vi(T) — v,(T). 
Therefore yi (T) = v,(T) whenever T C SU S*. On (S U S*)°, we have 
vi((S U S*#)°) = v.((S U S#)°) = 0. Therefore v*# = v,. The uniqueness of 
the function part follows from the uniqueness in the Radon—Nikodym Theorem, 
which is part of the statement of that theorem (Theorem 9.16). 


5. Continuous Linear Functionals on L? 


We return to the question of identifying the continuous linear functionals on L? 
spaces. Let (X, A, 1) be a fixed o-finite measure space. The space L?(X, 2) 
is a normed linear space and, as such, is both a vector space and a metric space. 
The scalars may be real or complex. 

Recall from Section V.9 that a linear functional on L? (X, jz) isa linear function 
from L?(X, j2) into the scalars. Proposition 5.57 shows that a linear functional 
x* is continuous if and only if it is bounded in the sense that |x*(f)| < C|l fl i 
for some constant C and all f in L?. The inequality |x*(f)| < Cl f Ilp holds for 
all f in L? if and only if it holds for all f with || f'|| ae 1, if and only if it holds 
for all f with || f'|| pal. If there is such a constant C, then the finite number 


I|x*|| = sup |x*(f)| = sup |x*(f)| 
Ifllpsl Ifllp=1 
is the least such constant C and is called the norm of x*. Since ||x*|| is one such 
constant C, we have 


x*(AI < lI Lp: 

Let p be the dual index to p, defined by ‘ + a = 1. Each member g 
of L?'(X, 4) provides an example of a continuous linear functional on L? by 
the formula x*(f) = f y fg du. The linear functional x* is bounded, hence 
continuous, as a consequence of Holder’s inequality: | ty fg du| < Ils ll, If Ilp- 
This inequality shows that ||x*|| < |lg|l pi: Proposition 9.8 shows that equality 
|x*|| = Ilg ll, holds if jz is o-finite. 

Theorem 9.19 gives a converse when 1 < p < ov, saying that there are no 
other examples of continuous linear functionals if jz is o-finite. By contrast, there 
can be other examples in the case of L°°(X, ww). For example, for the situation 
in which X is the set of positive integers and A consists of all subsets of X and 
jz is the counting measure, Problems 39-43 at the end of Chapter V show how 
to construct a bounded additive set function on A that is not completely additive, 
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and they show how this set function leads to a notion of integration (hence a linear 
functional) on this L© space; this linear functional is not given by an L! function. 


Theorem 9.19 (Riesz Representation Theorem for L?). Let (X,.A, uw) be a 
o-finite measure space, let 1 < p < co, and let p’ be the dual index to p. If x* is 
a continuous linear functional on L?(X, jz), then there exists a unique member g 
of L” (X, 2) such that 


xf) = [ fgedu 


for all f in L?. For this function g, ||x*|| = lg ll, 


REMARKS. For 1 < p < &, Proposition 9.9 shows that L(V, jw) is separable 
if ~w is a Borel measure on an open subset of R%. For this or any other setting 
in which any of these L? spaces is separable, Alaoglu’s Theorem (Theorem 
5.58) says that any bounded sequence in L?(V, jz)* has a weak-star convergent 
subsequence. Because of Theorem 9.19 we know what the members of the dual 
space are. Thus any bounded sequence in L”’ has a subsequence that is convergent 
weak-star against L?. In effect we obtain a nonconstructive way of producing 
members of L”’. Problem 8 at the end of the chapter will illustrate the usefulness 
of this technique. 


PROOF OF UNIQUENESS. Write X = U2, Xn disjointly with (X,,) finite 
for all n. If f, fg du = 0 for all f in L?, then fy Janx,g du = 0 for every 
measurable subset A of X. Taking A successively to be each of the sets where 
Re g or Img is > 0 or is < 0 and applying Corollary 5.23, we see that g is 0 
almost everywhere on X,, for each n. Hence g is 0 almost everywhere. 


PROOF OF EXISTENCE IF j1(X) IS FINITE. Temporarily let us suppose that the 
underlying scalars are real. Define a set function v on A by v(E) = x*(g); v is 
well defined because every /¢ is in L?, and v is additive because x* is linear. If 
E,, is an increasing sequence of measurable sets with union £, then lim, Ig, = Ig 
pointwise, and hence lim, |/~—TJ¢, |? = 0 pointwise. By dominated convergence, 
lim, ||7z — Jz, ||, = 0. Thus 

|v(E) — v(E,)| = |x" Ue — Te, )| < We" le — Ze, Il,» 
and the right side has limit 0. By Proposition 5.2, v is completely additive. 
The set function v is bounded because |v(E)| = |x*Ue)| < |lx*|llZell, = 
Ilx* || QuCE))!/? < ||x*||(w(X))!/?, and it satisfies v < pz because if u(E) = 0, 
then / is the 0 function of L? and thus v(Z) = x*Ug) = x*(0) = 0. By the 
Radon-Nikodym Theorem in the form of Corollary 9.17, there exists an integrable 
real-valued function g such that v(£) = ie g dy forall E,ice., 


x*Udp) = / Ipgdu for every measurable set E. 
x 
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By linearity, this equality extends to show that x*(s) = [, sg du for every 
simple function s. Let f > 0 be in L?, and choose an increasing sequence 
{s,} of simple functions > 0 with pointwise limit f. We shall show that fg 
is integrable and x*(f) = f fg dw. In fact, let A be the set where g(x) > 0. 
Then lim, | f 74 — S,J4|? = 0 pointwise, and hence lim, || f 74 — Snlall, = 0 by 
dominated convergence. Since 


lx*(fIA) — x" (SnTa)| < Wx" IFLA — Salall, 


and since the right side tends to 0, the set {x*(s,/,4)} of numbers is bounded. 
Thus the set { dye Snlag du} of equal numbers is bounded. Since g > 0 on A, the 
functions s,J4g increase to f 14g, and thus [ y f Lag dp is finite by monotone 
convergence. In other words, fg* is integrable. Similarly fg~ is integrable, and 
thus fg is integrable. Since lim, x*(s,J4) = x*(f I) and lim, Sy Splagdu = 
f y f Tag du and since a similar result holds for g~ , we conclude that 


win= | fedu for all f > Oin L?. 
x 


This conclusion, now proved for f > 0, immediately extends by linearity to all 
f in L? and completes the verification that x*(f) = y fg dw in the case that 
the scalars are real. 

If the scalars are complex, we apply the above argument to the restrictions of 
Re x* and Im x* to the real-valued functions in L” , obtaining real-valued functions 
gi and go in L” with Rex*(f) = fy fgi du and Imx*(f) = fy fg2dy for all 
real-valued f. Then x*(f) = dy F (gi + ig2) dy for all real-valued f, and it 
follows that this same equality is valid for all complex-valued f. Since g; and go 
are in L?, so is g; +ig2. This completes the verification that x*(f) = ie fgdu 
for a suitable g in the case that the scalars are complex. 

Finally Proposition 9.8 shows that ||x*|| = ||g|| a and completes the proof of 
the theorem under the assumption that j1(X) is finite. 


PROOF OF EXISTENCE IF {4(X) IS o-FINITE. Again we temporarily suppose 
that the underlying scalars are real. Since y is o-finite, we can write X as the 
increasing union of sets E,, of finite measure. Let L ” be the set of members of L? 
that vanish off E,,, and let x7 be the restriction of x* to L?. Find, by the special 
case just completed, a function g, for each n such that x7 (fn) = f E, Sn8n de for 
all f, in L?. The already proved uniqueness result implies that the restriction of 
8n+1 to E,, equals g, almost everywhere [dix]. Let g be the measurable function 
equal to g; on E, and equal to g, on FE, — En_1 ifn > 2. Let A be the set where 
g(x) > 0,and let f > Obein L”. Then fJe,7,4 increases to f 14, and dominated 
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convergence implies that lim, || fZe,n4 — f Jal >= 0. Since fJg,nag increases 
pointwise to f 74g, monotone convergence gives 


Sy flagdu = lim, fy fle,nag du =lim, fy fle,.48n du 
= lim, x7 (f Te,na) = lim, x*(flz,na) = x* (fla), 


the last equality holding since || f7z,.a — fJal|, tends to 0. Hence fg is 
integrable. By proceeding similarly with the set where g(x) < 0 and by writing 
a general f as f = f* — f~, we conclude that fg is integrable for every f in 
L? and x*(f) = fy fg du, provided the scalars are real. 

Again there is no difficulty in extending the argument to the case that the scalars 
are complex, and Proposition 9.8 shows that ||x*|| = lg ll,- 


6. Marcinkiewicz Interpolation Theorem 


This section concerns linear functions and some almost-linear functions between 
L? spaces. We saw evidence in Proposition 9.5 that the L? spaces behave 
collectively like a well-behaved family of spaces. That result specifically gave 
an upper bound for If ll, in terms of If ll), and If ll, when p; < p < pz. It 
turns out that linear functions between pairs of L? spaces satisfy inequalities of 
a similar sort. 

There are two classes of results in this direction. Results of the first kind use 
methods of complex analysis, address bounded linear operators only, and give 
estimates for a one-parameter family of operators that are sharp at the ends. The 
main result of this kind is the “Riesz! Convexity Theorem,” whose precise general 
statement and proof we omit. The thrust of the theorem is that if a linear operator 
T satisfies the two estimates IT Allg, < Millf ll, and IT (Allg. < Mo\f \l,,> 


then T satisfies also an estimate ||T (f) Il < Milf Il, for all pairs (p, q) such that 


*) lies on the line segment in the Gs ) plane from Ca x) to ee a The 
conclusion gives also some specific information about M. 

The existence of some M in the Riesz theorem can be obtained in most cases 
of interest by a corresponding real-analysis result known as the “Marcinkiewicz 
Interpolation Theorem.’ We include below a statement of the Marcinkiewicz 
theorem in general and the proof in a special case of exceptional interest. The 
Marcinkiewicz theorem imposes some restrictions on the pairs (p1,q1) and 


(p2, 2) that are not needed in the Riesz theorem, but situations that do not 


'The person in question here is Marcel Riesz, whose name is associated also with convergence 
of the partial sums of the Fourier series of an L? function in L? for 1 < p < oo. The other mentions 
of the name “Riesz” in this book, namely in connection with the Rising Sun Lemma of Section VII.1 
and various results known as the Riesz Representation Theorem, refer to Frigyes Riesz. 
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satisfy these restrictions are of comparatively little interest in applications. In 
any event, in the situations where the Marcinkiewicz theorem applies, it is only 
the specific information about M in the Riesz theorem that does not come out of 
the real-analysis proof of the Marcinkiewicz theorem. 

Let us mention without proof two consequences of the Riesz Convexity 
Theorem—the Hausdorff—Young Theorem and Young’s inequality. 

The linear operator T in the Hausdorff—Young Theorem is the Fourier trans- 
form ¥, and the instances of the theorem that we knew previously are when 
(p, p’) equals (1, 00) or (2, 2). The numerology that allows the Riesz Convexity 
Theorem to apply is that 

1 1-¢t ft d 1 1-t ft 
foe. Uae. er? “ae ae 1G 


for the same fr: 


HAUSDORFF—YOUNG THEOREM. If 1 < p < 2 and if p’ is the dual 
index, then the Fourier transform f, initially defined on the dense 
subspace L!(R%)  L?(R™) of L?(R%), satisfies 


IFA Ilp < If llp 


for such f and therefore extends to all of L?(R”) in such a way that 
this same inequality holds. 


If one tries to derive the Hausdorff—Young Theorem from the Marcinkiewicz 
Interpolation Theorem, one gets only the conclusion || F(f)]|| pe M\f |l es without 
the improvement on the bound: M < 1. 
The linear operator T in Young’s inequality can be taken to be g + /f * g with 
f fixed in L”?. The instances of the inequality that we knew previously are when 
(q,r) equals (1, p) or (p’, co). The relevant numerology is that 
co ae ee 


/ / 


q 1 p r Pp 


t 
oO 
for the same fr: 


YOUNG’S INEQUALITY. Let p,g, andr be three indices > 1 and < oo 
such that i = 5 +4—1. Then convolution f * g is well defined for 


f in L?(RY) and g in L4(R%), and it satisfies 


If * sll, < IF lplsll- 
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By way of preparation for the statement of the Marcinkiewicz theorem, let 
(X, A, ) and (Y, B, v) be o-finite measure spaces, and let T be a function from 
a vector subspace of measurable functions on X , modulo sets* of jz measure 0, 
into measurable functions on Y, modulo sets of v measure 0. We say that T is a 
sublinear operator if |7(f +g)| < IT(f)|+1T (g)| forall f and g in the domain 
of T. 

The two examples of T to keep in mind are the sublinear operator f +> f* in 
IR’ of passing to the Hardy—Littlewood maximal function, as in Section VI.6, and 
the linear operator f +> Hf in R! of forming a certain approximation H to the 
Hilbert transform, as in Section VIII.7. More specifically the Hardy—Littlewood 
maximal function of a locally integrable function f on R% is defined as 


f*(x) = sup m(B)" | [f(x — y)| dy, where B, = B(r;0) in R™, 
0<r<oo B, 


and the sublinear operator T is Tf = f*. The approximation H, to the Hilbert 
transform is defined for f in L' + L” by 


i fE-D 


TS \r\>1 t 


Ay f(x) =h, * fx) = 


as the convolution with a fixed L? function. 

Let 1 < p,q < cw. We generalize the notion of boundedness of a linear 
operator between L?(X) and L4(Y) so that we can work with sublinear operators 
as well as linear ones. A sublinear operator T is said to be of type (p,q) or 
strong type (p, q) if ITF ll < M\f\l, with M finite and independent of f. The 
least M for which this inequality holds is called the norm or operator norm of 
T.Ifq < ©, then Chebyshev’s inequality from Section VI.10 gives 


SIT fa dv 
te 
and for any M such that ITF ll < M||f'\l, forall f, it follows that 


v({y e¥ | |ITFO)| > €} < 


MIlfll,\* 
v({y e¥ | ITSO) > &} s (=) 
If g < oo, asublinear operator T is said to be of weak type (p, q) if it satisfies 


LH 
E 


>This condition means that the domain of T is to be regarded as a vector subspace of measurable 
functions, except that two functions are identified if they differ only on a set of measure 0. 


v({fy e¥| |ITFO)| > €} < ( 
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for some M. In this case the least such M is called the weak-type norm of T. We 
already encountered the definition of weak type (1, 1) in Section V1.6. Ifg = ~&, 
the convention is that weak type (p, oo) is the same as strong type (p, oo). 
Consider our two examples. The operation T(f) = f* of passing to the 
Hardy—Littlewood maximal function in RY is of weak type (1, 1) by the Hardy— 
Littlewood Maximal Theorem (Theorem 6.38), and the evident inequality 


| 


shows that f +» f* is of type (00, oo) as well. The linear operator T(f) = H, f 
of passing to the approximation H, to the Hilbert transform in R! is of weak type 
(1, 1) and type (2, 2) by Theorem 8.25. 


sup mB) If@=yldy| <I llec 


O0<r<oo 


Theorem 9.20 (Marcinkiewicz Interpolation Theorem). Let (X,.A, jw) and 
(Y, B, v) be o-finite measure spaces, and let (p;, q1) and (p2, q2) be two pairs of 
indices between 1 and oo. Suppose that 1 < py <q, < ~W,1<pr< GQ < mH, 
and p; # p2. Let T be a sublinear operator from L?! (X, «) + L??(X, jz) to the 
space of measurable functions on Y modulo sets of v measure 0, and suppose 
that T is of weak types (p1, q1) and (p2, gz) with respective weak-type norms 
M, and Mp. Fix t with O < t < 1, and define (p,q) by 


1 1-t t 1 1-t t 
= + — and = a 
P P1 P2 q 71 q2 


Then T is of strong type (p, g) with 


ITF, <CIfll, forall f € L?(X, w), 


with the constant C depending only on t, M,, Mo, pi, q1, p2,q2 and with C 
bounded as a function of ¢ as long as ¢ is bounded away from 0 and 1. 


Before discussing the proof, let us apply the theorem to our two examples, 
the Hardy—Littlewood maximal function and the approximation Hj to the Hilbert 
transform. Then let us draw some consequences of these applications. As was 
said before the statement of Theorem 9.20, the sublinear operator f +> f%* is 
of weak type (1, 1) and strong type (2, 2). The theorem immediately gives the 
following corollary. 


Corollary 9.21. If 1 < p < ov, then there exists a constant A, such that the 
Hardy—Littlewood maximal function satisfies 


If" ll, < Apllf ll, 
for all f in L?(RY). 
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The case of this result in one dimension implies something in N dimensions 
that we have not obtained earlier. If f is locally integrable on R%, one says that 
strong differentiation holds for f at x if 


; 1 
lim —— 
diam(R)—>0, m(R) 
R=geometric rectangle 
centered at x 


[toa are} 


A consequence of Corollary 9.21 is that strong differentiation holds almost ev- 
erywhere for each f in L?(R’) for p > 1. The proof is outlined in Problems 
13-15 at the end of the chapter. By contrast, it is known that there are functions 
in L'(R™) for which strong differentiation fails everywhere. 

In the second example the operator H; that approximates the Hilbert transform 
is of weak type (1, 1) and strong type (2, 2), and Theorem 9.20 allows us to 
conclude that it is of strong type (p, p) for 1 < p < 2. But we can do better. 
The operator H, is convolution by the function h; with hy(x) = 1/(x) for 
|x| => 1 and h,(x) = O for |x| < 1. The function hf; is in L? for all p > 1, 
and Proposition 9.10f shows that h, * f is well defined as a bounded continuous 
function whenever f is in some L? with 1 < gq < oo. Thus A, is defined on 
all L? classes for 1 < p < o, and a general result that we prove below as 
Lemma 9.22 shows that an inequality lAi fll, < Apllfllp for all f in L? implies 
| Ai gil, < Apllgll,, for all g in LP, provided p’ is the dual index to p and 
1 < p < ~. Thus the boundedness result for H; on L? extends to 1 < p < oo. 

Next, we define the dilate h,(x) = e~'h,(x) in the usual way and put H, f = 
h, * f. We shall see for every ¢ > 0 that He fll, < Apll fil, with the same 
constant A,, and finally we shall see that we can let e decrease to 0 and obtain 
the Hilbert transform H as a well-defined linear operator on all L? classes for 
1 < p < ~; the estimate is IAF Il, < Apll f|l,-again for the same A,. Problems 
20-22 at the end of the chapter indicate how to use this boundedness to prove 
that the Fourier series of any L? function on [—z, 7] converges to the function 
nLPifl<p<ow. 


Lemma 9.22. Fix p with | < p < o, let p’ be the dual index, and suppose 
that h is in L?(RY) NL? (RY). If |n* fl, < Apll fl, for all f in L?(R™), then 
It * gil, < Apllgll,, for all g in L?. 


REMARKS. Since h is in L?’,h x f is in L© when f isin L?. Thus h x f is 
well defined, and it is meaningful to say that h « f is actually in L?. Whenh x f 
is in L?, the integral [(h * f)g dx is well defined for g in L?’. A little care is 
required in working with this integral in the proof because f(\h| * f)g dx need 
not be well defined and Fubini’s Theorem may not directly applicable. 
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PRooF. For any function F on R, define F*(x) = F(—x) and observe that 
|F* lL, = ||F |, for 1 < r < oo. If g is an integrable simple function, then 
(h* x g)(x) = fh(y—x)g(y) dy = f h(—y —x)g*(y) dy = (hx g*)(—x). Thus 


this g and an integrable simple function f together satisfy 
[hs fO@)g@)dx = ff ha — y)f(y)g) dy dx 
= ffh@ + y)f (yg) dy dx 


and SGxs) fo) dy = [At * 9)(-y) fF) dy 
= ff n*(-y — x)g(x) f(y) dx dy 
= [n+ y)g(x) f(y) dx dy. 


Because f and g are in every L’ class, the right sides of these two displays are 
finite when absolute value signs are inserted in the integrands. Thus Fubini’s 
Theorem applies and shows that the two right sides are equal. Combining this 
fact with Hoélder’s inequality and the hypothesis about 4, we obtain 


[faxes (nfo) dy| =|fG* fY@g@) dx| 
< lh fl pislly < Apllf*llpligily = Apllfliplig ily 


whenever f and g are integrable simple functions. If a general fo in L” is given, 
we can finda sequence f;, of integrable simple functions such that || fn — foll, > 9, 
and we apply this inequality to each f,,. Then the left side of the inequality tends 
to | fh * g”)(y) fo(y) dy|, and the right side tends to Ap II foll pls, Taking the 


supremum over all fo with || fol|,, < 1 and applying Proposition 9.8, we find that 


It * o" ll, < Apllgll,, = Apllg*|l,,. In other words, 


In * Sally < Apllgnll,y 


for every integrable simple function g,,. For a general g in L”’ , choose a sequence 
of integrable simple functions gy with || gn — g|| Ped 0. Since A is in L?, it follows 
from Proposition 9.10f that h * g, converges to h * g uniformly. On the other 
hand, the inequality ||h * (gm — 8n) lly < Apl|lgn — Sally shows that {h « g,} is 


Cauchy in L? SS By Theorem 5.58, {i « g,} converges to some function in L? ‘and 
has an almost-everywhere convergent subsequence to this function. Since h * g, 
converges uniformly to h * g, we conclude that h * g, converges to h * g in L?’. 
Therefore || « gil, < Apllgll,> and the proof is complete. 


Again let h, be the function on R! equal to 1/(zx) for |x| > 1 and equal 
to 0 for |x| < 1. This is in L’(R!) for every r > 1. Our operator giving an 
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approximation to the Hilbert transform is H, f = h,; « f. Using our results from 
Chapter VIII along with the Marcinkiewicz Interpolation Theorem, we saw earlier 
in this section that Hj satisfies IA fll, < Apll fll, for 1 < p < 2 andall f in 
L?(R'). Lemma 9.22 shows that this inequality remains valid for 1 < p < oo. 
From this result we can extend the Hilbert transform to L?(R') for all p with 
1 < p < ~, as follows. 


Theorem 9.23. Let 1 < p < ~, let 


iOS he te | I/(rx) for |x| >, 


0) for |x| < e, 
and define H, f =h, * f for f in L? and ¢ > 0. Then 
(a) there exists a constant A, independent of ¢ such that ||, f'|| p< Abll calle 
for all f in L?, 
(b) the limit 


_ 1 f(x —t)dt 
H =] 
f(x) im — a ; 


exists in L? for every f in L”, 
(c) the operator H satisfies IHF Il, < Apll fll, for every f in L?. 


PRooF. Convolution with A, is well defined on L” because h, is in L”’, p’ 
being the dual index for p. The three computations 


H, f (x) = (f *he)(x) = f fe — ye hie! y) dy = f f (x — ey)hi(y) dy 
= fe! f,-.(e7!x — y)hi(y) dy =e | (Mi f.-1)(e7'x), 


SVC Gy |P dx =e? fi fee x)? dx =e? f (Mi fe) (a)? dx, 


and J lge1()|? dx = €? f |g(ex)|? dx =e“? f g(x) |? dx 
allow us to write 
He fll =e! PM fe-ll2 < Ave! Pfeil? = ASIF IE. 


This proves (a), the constant A, being any constant that works for H. 

In Lemma 9.24 below we show by a direct computation that (b) holds for the 
dense subset of C! functions f of compact support. Let us deduce (b) for general 
f in L? from this fact and (a). In fact, if we are given f, we choose a sequence 
Jn in the dense set with f, > f in L?. Then 


Hef — He fll, < Hef — fail, + We tn — He fally + WHe Gn — Alp 
< Ap fn — Fil, ae We fn a Ae fall, te Apll fn a Fil 
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Choose n to make the first and third terms small on the right, and then choose ¢ 
and ¢’ sufficiently close to 0 so that the second term on the right is small. The 
result is that H,, f is Cauchy in L? along any sequence ¢, tending to 0. This 
proves (b), apart from the direct computation for the dense subset. 

In (b), we proved that H,f — Hf in L?. Then (a) gives IAF, = 
lim, jo Hef ll, < lim sup, jo Apll fl, = Apllfl,- This proves (c) and completes 
the proof of Theorem 9.23 except for the following lemma. 


Lemma 9.24. If f is a C! function of compact support on R!, then 


ot il f(x —t)dt 
lim 
80 1 Sir|>e t 


exists uniformly and in L? for every p > 1. 


PROOF. Let || - || denote the supremum norm or the LZ? norm. By the Cauchy 
criterion it is enough to show that 


eee 


tends to 0 for the above interpretations of || - || as ¢; and € tend to 0. Since 
| f’(u)| < M, use of the Mean Value Theorem on Re f and Im f shows that 
|f(x—t)— f()| < 2M|t|. Suppose that 0 < e; < €2 < 1. If FE isa compact set 
containing the sum of any member of the support of f and any x with |x| < 1, 
then it follows that 


fx —Hdt) _ [f(x —t) — f@)]dt 
Wi pega ia | 


< | If —t)— fF), at 
- €)S|t|<e0 I¢| 


< | 2M|t\\|Le\| dt 
€1S|t\|Se2 |t| 


= 4M |Iz\|(€2 — €1). 


The right side tends to 0 as €; and €2 tend to 0, and the proof of the lemma is 
complete. 


Having now completely proved Theorem 9.23, let us return to a discussion of 
the proof of the Marcinkiewicz theorem, Theorem 9.20. The proof is considerably 
simplified by assuming that gj = p; and gz = p2, which happens to be the special 
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case of most interest to us, and we shall give a proof only under this additional 
hypothesis. The idea in the special case will be to estimate integrals of powers 
of functions by using Proposition 6.56b to reduce the estimates to facts about 
distribution functions. 

The proof in general has the same flavor as the argument we give, but it involves 
also a subtler decomposition of f into two parts, a nonobvious application of 
H6lder’s inequality, and a clever use of Proposition 9.8. 


PROOF OF THEOREM 9.20 WHEN pi = q1 < P2 = q2. We divide matters into 
two cases, the first when p2 < oo and the second when pz = co. 
We begin with the case with p2 < oo. Let 


ME) = Are (&) = v({y | ITFO1 > &} 

be the distribution function of Tf as in Section VI.10. Proposition 6.56b shows 

that 
oe) CO 
ITF lp = rf PINE) dé = 2p f §?-'2(26) dé. (*) 
0 0 

With € > 0 fixed, we shall estimate 4(2&). We decompose f as f = fi + fo 

with 


f(x) if | f@) > & 


0 otherwise 


fin ={ 


| ts pee” aaa 


0 otherwise 


Just as in the proof of Proposition 9.4c, f, isin L?'(X, yw) and fo isin L”?(X, ww). 
Because f = fi + fo, sublinearity of T gives |Tf| < |Tfi| + |T fol. If A, and 
Az are the distribution functions of Tf; and Tf and if w > 0 is given, then 


A(2a) < Ay (ar) + A2(@) 


because |Tf| can be > 2a only if at least one of |Tfi| and |7f2| is > @. For 
every a > 0, the assumption that T is of weak types (p1, pi) and (p2, p2) gives 


uS 
M Pl M po 
M@) < (en) and g(a) < jp 2s) 


For a = &, we therefore obtain 


A(26) S AiG) + A2(E) S mper f Leile aut Meer | | fol”? du 
xX Xx 


=mpen | fPdu + meen f [fie du. (4%) 
{| f1>§} {| f1<&} 
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With the estimate for A (2€) in hand, we can now let € vary and estimate ||7 f Ib ‘ 
From (x) and (*:«) we obtain IT FS < I; + h, where 


[o,e) 
n=2pmy | ce taa| |f)|P! d(x) dé 
0 (If @|>8) 


(oe) 
and b= 2pmy | grins f | f(x)? du(x) dé. 
0 (If OOI<é} 


Fubini’s Theorem gives 


fl 2P pM”! 
harem f ise[ fo errtas}au= PEL ff pray. 
x 0 P—Pi Jx 


Similarly 
_ 2’p M3” 
eh LAP du, 
P2— Pp 


and thus ||7f||> < C?|| fp as required. 

The remaining case to handle has p2 = oo. The general line of the argument 
is the same as above, but there are small differences. With é fixed, the definitions 
of fi and f2 are adjusted to be 


f(x) if | f@)| > E/T Ilo, 
fix) = ; 
0 otherwise, 
and fy = f — fi. Then || fallog < E/IIT loos IIT falloo < §and A2(E) = 0. Hence 
QE) = (6) + al) = mG) = MpEM |f|?' du, 
{IF1>§/IIT lloo} 


and then the proof can proceed along the lines above. 


7. Problems 


1. Forameasure space of finite measure, prove that L? C L4 whenever p > q > 1. 
More particularly prove, for the case that the total measure is 1, that || fll, < II f\l, 
whenever p > q > 1. 

2. Let p,q,r be real numbers in [1, +00] with 7 + 7 + i = |. Using the equality 
c + c = | and Hélder’s inequality, prove that = lfghidu <IlfllpigigllAll,- 
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For a measure space of finite measure, let {f,} be a sequence of measurable 
functions converging pointwise to f. Suppose that 1 < g < p < o,and suppose 
that the sequence of numbers {|| f'||,,} is bounded. Using Egoroff’s Theorem 
(Problem 17, Chapter V) or uniform integrability (Problem 21, Chapter V), 
prove that f, — fin L’. 
This problem produces an example of a measure space in which two distinct 
members of L® act as the same linear functional on L!. The measure space 
(X, A, ) has X consisting of a single point p, A = {@, X}, and w(X) = +o. 
(a) Show that dim L!(X) = 0 and dimL™(X) = 1. 
(b) Proposition 9.8 assumed o-finiteness to ensure its conclusion when p = oo. 
Show that the conclusion of Proposition 9.8 fails for p = oo in this example. 


If f is real-valued and integrable on the measure space (X, A, jw), what are all 
the Hahn decompositions for the signed measure v(E) = /f, pf dp? 


Provide examples of each of the following. Each example can be produced on 

one of the following three algebras of subsets of a set X: the finite subsets of 

a X and their complements, all subsets of a countable set X, the Borel sets of 

X = (0, 1]. 

(a) An additive set function v on an algebra of sets with |v(X)| < oo but with 
sup; |v(E)| = oo. 

(b) A counterexample to the Hahn decomposition if the assumption “o-algebra”’ 
is relaxed to “algebra” but the other assumptions are left in place. 

(c) A finite measure v and anon o-finite measure jz, both defined on ao-algebra, 
such that v < y but v is not given by an integral with respect to p. 


Problems 7—8 concern harmonic functions and the Poisson integral formula for the 
unit disk in R?. These matters were the subject of Problems 27-29 at the end of 
Chapter I, Problems 14—15 at the end of Chapter III, Problems 10-13 at the end of 
Chapter IV, and Problems 18—20 at the end of Chapter VI. Problem 7 updates the 
results from Chapter VI so that they apply for 1 < p < o, and Problem 8 uses 
weak-star convergence to establish a converse result. 


ve 


Ifl < p < wandif f isin L?((-x, wv], = d0), prove that the Poisson integral 
u(r, @) of f has the properties that ||u(r, - )|l, < || fll, forO <r < 1 and that 
u(r, -) tends to f in L? in the sense that lim,y1 |lu(r, -) — fll, = 9. 


Suppose that 1 < p’ < oo and that u(r, @) is a harmonic function on the open 
unit disk such that supg—,—; ||u(r, - ||, 18 finite. By using Problem 13 at the end 
of Chapter IV and taking a weak-star limit of a suitable sequence of functions 
u(ry, 0) with {r,} increasing to 1, prove that u(r, @) is the Poisson integral of a 
function in L? ((-z, rT], * d0). 


Problems 9-12 concern decomposing any bounded nonnegative additive set function 
on an algebra into a completely additive part and a “purely finitely additive” part. They 
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make use of Zorn’s Lemma (Section A9 of the appendix). A bounded nonnegative 

additive set function jz will be called purely finitely additive if there is no nonzero 

completely additive set function v such that 0 < v(E) < w(E) forall E. 

9. Suppose that yz is an additive set function on the o-algebra of all subsets of 
the integers such that ww has image {0, 1} and w({n}) = 0 for every integer n. 
Prove that jz is purely finitely additive. (Such a jz was constructed by means of 
a nontrivial ultrafilter in Problems 39-41 at the end of Chapter V.) 


10. Use Zorn’s Lemma to show that any bounded nonnegative additive set function 
is the sum of a nonnegative completely additive set function and a purely finitely 
additive set function. 


11. Prove that if v is a bounded nonnegative completely additive set function and if 
jt is bounded nonnegative and purely finitely additive with O < u(E) < v(E) 
for all FE, then up = 0. 


12. Deduce from the previous problem and the Jordan Decomposition Theorem that 
the decomposition of Problem 10 is unique. 


Problems 13-15 prove the theorem, for the case of R2, of Jessen—Marcinkiewicz— 
Zygmund concerning strong differentiation of integrals of L’” functions almost ev- 
erywhere when p > 1. Strong differentiation holds at (x, y) for the locally integrable 
function f on R? if 
: 1 
lim —. 
diam(R)>0, m(R) 


R=geometric rectangle 
centered at (x,y) 


/ ftu,v)dudu = f(x,y). 
R 


Let f** be the associated maximal function, given by 


1 
sup aap | fe. wlavdu. 
diam(R)—0, m(R) R 
R=geometric rectangle 
centered at (x,y) 


fas 


13. Let f\(x, y) be the value of the one-dimensional Hardy—Littlewood maximal 
function of y +> f(x, y), and let fo(x, y) be the value of the one-dimensional 
Hardy—Littlewood maximal function of x  fi(x, y). Prove that f**(x, y) < 
f2(%, y). 

14. Using Corollary 9.21 and the previous problem, prove that || f™||,, < A? fll, if 
l<p<om. 

15. Conclude that strong differentiation holds almost everywhere for each f in 
L?(R?) if 1 < p< oo. 

Problems 16-19 concern the Hilbert transform H defined in Section VIII.7 and 

Theorem 9.23. The operator H is defined on L?(R!) for 1 < p < oo. Recall 

the functions h,, O,, and w, on R! satisfying O, =h,+ We. Let f bein L”, and let 

f* be the Hardy—Littlewood maximal function of f. 
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16. Prove that there exists a continuous integrable function ® > 0 on R! of the form 
D(x) = Po(|x|), where Do is a decreasing C! function on [0, 00), such that the 
function wy, for ¢ = | satisfies |y| < ®. 


17. Deduce from the previous problem and Corollary 6.42 that sup, |(We* f)(x)| < 
Cf*(x). How does it follow that lim,jo(We * f)(x) = 0 almost everywhere for 
all finZL’,1<p<o? 


18. Prove that O, * f = P, * (Hf) for f € L? with 1 < p < o, where P,(x) = 
P(x, €) is the Poisson kernel. 


19. Deduce from the previous two problems that the limit in the equality 


1 —t)dt 
Hf (x) =lim fe-p 
elO |t|>e t 


of Theorem 9.23 may be interpreted as an almost-everywhere limit if f is in 
L?(R') and 1 < p <0. 


Problems 20-22 prove the theorem of M. Riesz that the partial sums of the Fourier 
series of a function in L?([—z, ]) converge to the function in L? if 1 < p< mw. 
Recall from Sections 1.10 and VI.7 that if f is integrable on [—z, zr], then the n™ 
partial sum of the Fourier series of f is given by (S, f)(x) = (Dn * f)(x), where 


i 1 
D, is the Dirichlet kernel D,,(t) = "+2" 


als and the convolution is taken relative to 
3 
1 
x7 dt. 
20. Suppose it can be proved that Sn fll, < Apll fll, for 1 < p < o with A, 
independent of n and f. Prove that S, f — f in L? for all f in L”, provided 
l1<p<om. 


Bt Define Bag) = 2 sag < ltl < w and E,(t) = 0 for |t| < 544. 
Then extend E£,,(t) periodically. Show that D, — E, = @, is integrable on 
[—z, 1] with ||@,||,; < C independently of n, and say why it is therefore enough 
to prove that the operators T,, with T, f = E,, * f satisfy ||T, fll, < Boll f|l, for 


1 < p < o with B, independent of n and f. 


22. In E,(t), write sin(n + 4)t as a linear combination of two exponentials efkt 
rewrite each exponential as e~!**— e'** , and decompose the operator 7, as the 
corresponding sum of two operators. By relating these two operators separately 
to the operators H, in Theorem 9.23, prove that the T,,’s satisfy the desired 
estimate ||T, fll, < Boll fllp- 


Problems 23—26 develop a kind of function-valued integration known as conditional 
expectation in probability theory. They make use of the Radon—Nikodym Theorem 
(Theorem 9.16). Let (X, A, 14) be a measure space with w(X) = 1. 
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23. 


24. 


25. 
26. 
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If f is integrable and if 6 is a o-algebra contained in A, prove that there exists 
a function E[ f |] that 


(1) is measurable with respect to B and 

(ii) has [, fdu = fz ELf|B) dw for all B in B. 
Show further that E[f|6] is unique in this sense: any two functions satisfying 
(i) and (ii) differ only on a set in B of 4 measure 0. 
Suppose that X is a countable disjoint union of sets X, in A and that B consists 
of all possible unions of the X,,’s. Give an explicit formula for E[ f |B]. 


Show that if 6 = A, then E[f|6] = f almost everywhere. 


Let 6 and C be o-algebras with C C B C A. Prove the following: 
(a) E[E[f|B] | C] = ELf|C] almost everywhere. 
(b) If f and g are integrable and everywhere finite, then 
El f+g | 8) = Elf|8) + ElgiB) 
almost everywhere. 
(c) If g is measurable with respect to 6 and if f and fg are integrable, then 
E[fg | 8] = gE[f |B] almost everywhere. 
(d) If f and g are in L?(X, A, w), then f, f Elg|Bldu = f, ELf\Blg du. 


CHAPTER X 


Topological Spaces 


Abstract. This chapter extends considerably the framework for discussing convergence, limits, and 
continuity that was developed in Chapter II: topological spaces replace metric spaces. 

Section 1 makes various definitions, including definitions for the terms topology, open set, closed 
set, continuous function, base for a topology, separable, and subspace. It introduces two general 
kinds of constructions useful in analysis and other fields for forming new topological spaces out of 
old ones— weak topologies and quotient topologies. The section gives several examples of each. 

Sections 2-3 develop standard facts, mostly elementary, about how certain combinations of 
properties of topological spaces imply others. Examples show some limitations to such implications. 
Properties that are studied include Hausdorff, regular, normal, dense, compact, locally compact, 
Lindeléf, and o-compact. 

Section 4 discusses product topologies on arbitrary product spaces, an example of a weak 
topology. The main theorem, the Tychonoff Product Theorem, says that the product of compact 
spaces is compact. 

Section 5 introduces nets, a generalization of sequences. Sequences by themselves are inadequate 
for detecting convergence in general topological spaces, and nets are a substitute. The use of nets in 
many cases provides an easier way of establishing properties of subsets of a topological space than 
direct arguments with open and closed sets. 

Section 6 elaborates on quotient topologies as introduced in Section 1. Conditions under which 
a quotient space is Hausdorff are of particular interest. 

Sections 7-8 prove and apply Urysohn’s Lemma, which says that any two disjoint closed sets 
in a normal topological space may be separated by a real-valued continuous function. This result 
is fundamental to serious uses of topological spaces in analysis. One application is to showing that 
every separable Hausdorff regular topology arises from a metric. 

Section 9 extends Ascoli’s Theorem and the Stone—Weierstrass Theorem from their settings in 
compact metric spaces in Chapter II to the wider setting of compact Hausdorff spaces. 


1. Open Sets and Constructions of Topologies 


In applications involving metric spaces, we have seen several times that the 
explicit form of a metric may not at all be one of objects of interest for the space. 
Instead, we may be interested in the open sets, or in convergence, or in continuity, 
or in some other aspect of the space. The same open sets, convergence, and 
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continuity may come from two different metrics, and we have even encountered 
notions of convergence that are not associated with any metric at all. We saw in 
Section II.5, for example, that we could associate three different natural-looking 
metrics to the product X x Y of two metric spaces, and the three metrics led to 
the same open sets, the same convergence of sequences, and the same continuous 
functions. On the other hand, the notions in Chapter V of pointwise convergence, 
convergence almost everywhere, and weak-star convergence were defined without 
reference to a metric, and depending on the details of the situation, there need 
not be metrics yielding these notions of convergence. We have brushed against 
further, more subtle situations with one or the other of these phenomena—no 
special distinguished metric or no metric at all—but there is no need to produce 
a complete list. The present chapter introduces and studies an abstract gener- 
alization of the notion of a metric space, namely a “topology,” that makes it 
unnecessary to have the kind of explicit formula demanded by the definition of 
metric space. 

The framework for a “topological space” consists of a nonempty set and a 
collection of “open sets” satisfying the conditions of Proposition 2.5. Thus let X 
be a nonempty set. A set J of subsets of X is called a topology for X if 


(i) X and © are in T, 
(ii) any union of members of T is a member of 7, 
(iii) any finite intersection of members of T is a member of T. 


The members of 7 are called open sets, and (X, 7) is called a topological space. 
When there is no chance for ambiguity, we may refer to X itself as a topological 
space. 

Every metric space furnishes an example of a topological space by virtue of 
Proposition 2.5; we refer to the topology in question as the metric topology for 
the space. Two other examples of general constructions leading to topological 
spaces will be given later in this section, and some specific examples of other 
kinds will be given in Section 2. 

Neighborhoods, open neighborhoods, interior, closed sets, limit points, and 
closure may be defined in the same way as in Section II.2. As remarked after 
Corollary 2.11, the proofs of certain results relating these notions depended only 
on the definitions and the three properties of open sets listed above. These 
results are Proposition 2.6 and Corollary 2.7 characterizing interior, Proposition 
2.8 giving properties of the family of all closed sets, Proposition 2.9 relating 
closed sets to limit points, and Proposition 2.10 and Corollary 2.11 characterizing 
closure. Thus we may take all those results as known for general topological 
spaces, and it is not necessary to repeat their statements here. 

The notion of continuity extends to topological spaces in straightforward 
fashion. Specifically the definition of continuity at a point is extracted from 
the statement of Proposition 2.13: if X and Y are topological spaces, a function 
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X — Y is continuous at a point x ¢ X if for any open neighborhood V of 
f (x) in Y, there is a neighborhood U of x such that f(U) C V. Then Corollary 
2.14 is immediately available, saying that if f : X — Y is continuous at x and 
g:Y — Ziscontinuous at f(x), then the composition g o f is continuous at x. 

Proposition 2.15 and its proof are available also, saying that the function 
f : X — Y is continuous at every point of X if and only if the inverse image 
under f of every open set in Y is open in X, if and only if the inverse image under 
f of every closed set in Y is closed in X. We say that f : X — Y is continuous 
if these equivalent conditions are satisfied. The function f : X — Y is said to be 
a homeomorphism if f is continuous, f is one-one and onto, and f~! : Y > X 
is continuous. The relation “is homeomorphic to” is an equivalence relation. 

Now let us come to the two general constructions of topological spaces, known 
as “weak topologies” and “quotient topologies.” Both of these have many appli- 
cations in real analysis. 

The notion of “weak topology” starts from the fact that the intersection of a 
nonempty collection of topologies for a set is a topology; this fact is evident from 
the very definition. The prototype of a weak topology is the “product topology” 
for the product of a nonempty set of topological spaces. In the terminology of 
Section Al of the appendix, if S is a nonempty set and if X, is a nonempty set for 
each s in S, then the Cartesian product X = X ,.,X-5 is the set of all functions f 
from S into L),-5 Xs such that f(s) is in X, for all s € S. Now suppose that each 
Xs is a topological space, and let p, : X — X, be the s" coordinate function, 
given by p,(f) = f(s). IfX is given the discrete topology D, in which every sub- 
set of X is open, then each p, is continuous; in fact, the inverse image of an open set 
in X, is some subset of X, and every subset of X is in D. Form the collection of all 
topologies J, on X such that each p, : X — X; is continuous relative to TJ,,. The 
collection is nonempty since D is one. Let T be their intersection. The inverse im- 
age of any open set in X, under p, lies in J, for each w and hence lies in J. There- 
fore each p, is continuous relative to 7. We speak of T as the “weakest topology” 
on X such that all p, are continuous, and this topology for X is called the product 
topology for X. We shall study product topologies in more detail in Section 4. 

More generally let X be a nonempty set, let S be a nonempty set, let X, be 
a topological space for each s in S, and suppose that we are given a function 
fs : X — Xs for each s in S. If X is given the discrete topology, then every f; 
is continuous. Arguing as in the previous paragraph, we see that there exists a 
smallest topology for X making all the functions f; continuous. This is called 
the weak topology for X determined by { f,},<s. 


EXAMPLES. 
(1) Let (X, d) be a metric space. Then the weak topology for X determined 


by all functions x b> d(x, y) as y varies through X is the usual metric topology 
on X, as we readily check from the definitions. 
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(2) Let X be a normed linear space with field of scalars F, such as an L? space 
for 1 < p < oo, and let X* be the vector space of continuous linear functionals 
on X, as introduced in Section V.9. (For X = L? with 1 < p < o and with the 
assumption that the underlying measure is o-finite, Theorem 9.19 identified X* 
explicitly as L”, where p’ is the dual index to p.) Each member x of X defines 
a function f, : X¥* — F by the formula f,(«*) = x*(x). The weak topology 
on X* determined by X is called the weak-star topology on X* relative to X. 
The words “relative to X” are included in the terminology because two normed 
linear spaces X might have the same set X* of continuous linear functionals. 
In Section V.9 we introduced a notion of weak-star convergence but no metric 
associated to it. In problems at the ends of Chapters VI, VHI, and IX, this kind 
of convergence became a powerful tool for working with harmonic functions, 
Poisson integrals, and positive definite functions. Later in the present chapter 
we shall relate topologies to convergence of sequences,! and it will be apparent 
that weak-star convergence as defined in Section V.9 is the appropriate notion of 
convergence for the newly defined weak-star topology. 


(3) The construction in Example 2 can be transposed to other situations in 
which a topology is to be imposed on a vector space. For example, let X be a 
normed linear space with field of scalars F equal to R or C, and let X* be the 
vector space of continuous linear functionals on X. Then X™* indexes a set of 
functions x* : X — F. The weak topology on X determined by X* is known as 
the weak topology on X. This topology arises in some advanced situations, but 
we shall not have occasion to make use of it in the present volume. 


(4) We have encountered three vector spaces of scalar-valued smooth functions 
on open sets of Euclidean space —in Section III.2 the space C® (U) of all smooth 
functions on U , in Section VIII.4 the space CS, (U) of all smooth functions on U 
with compact support contained in U, and in Section VIII.4 the space S(R”) of 
Schwartz functions defined on R. The subject of partial differential equations 
makes extensive use of functions of all three of these kinds, and it is necessary to 
be able to discuss convergence for them. The easiest convergence to describe is for 
C™(U), where convergence is to mean uniform convergence of the function and 
all of its partial derivatives on each compact subset of U. Uniform convergence 
by itself is captured by the supremum norm, and somehow we want to work here 
with the supremum norms of the function and each of its partial derivatives on 
each compact subset. The appropriate topology turns out to be the weak topology 
determined by all the functions f + || f — g||, where || - || is the supremum of 
some iterated partial derivative on some compact subset of U. This construction 
is carried out in detail in the companion volume Advanced Real Analysis. A 
topology for the Schwartz space S(IR”) is obtained in a qualitatively similar way. 


‘And to “nets.” which are a generalization of sequences. 
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(U) is more subtle, and it too is constructed in the companion 


The second general construction of topological spaces is the “quotient topol- 
ogy” for the set of equivalence classes on X when X is a topological space and 
some equivalence relation” has been specified on X. If the relation is written as ~, 
the set of equivalence classes may be written as X /~, and the quotient map, i.c., 
passage from each member of X to its equivalence class, is a well-defined function 
q:X — X/~. With a topology in place on X, define a subset U of X/~ to be 
open if g~!(U) is open. Since inverse images of functions preserve set-theoretic 
operations, it is immediate that the resulting collection of open subsets of X/~ 
is a topology for X /~ and that this topology makes g continuous. This topology 
is called the quotient topology for X/~. In any other topology 7’ on X/~, any 
subset V of X/~ that is open in T’ but not open in the quotient topology must 
have the property that g~'(V) is not open; this condition implies that g is not 
continuous when TJ" is the topology on X/~. Therefore the quotient topology 
is the finest topology on X/ ~ that makes the quotient map continuous—in the 
sense that it contains all topologies making g continuous. 


EXAMPLES. 


(1) Let (X, d) be a pseudometric space such as the set of all integrable functions 
on some measure space (S,.A, w) with d(g,h) = he |g —h|dw. The pseudo- 
metric on X gives X a topology. For x and y in X, define x ~ y if d(x, y) =0. 
The result is an equivalence relation, and we know from Proposition 2.12 that the 
pseudometric d descends to be a metric on the set X/ ~ of equivalence classes. 
The quotient topology on X / ~ coincides with the topology defined by this metric. 


(2) Let X be the interval [—,, 7] with its usual topology from the metric on R, 
let S! be the unit circle in C with its usual topology from the metric on C, and let 
q . X — S' be given by q(x) = e’*. We can consider S! as the set of equivalence 
classes of X under the relation that lets —z and z be the only nontrivial pair of 
elements of X that are equivalent. The function g is continuous, and it carries 
compact sets to compact sets. In Problem 11 at the end of the chapter, we shall 
see that q exhibits S! as having the quotient topology. 


(3) Let X be the line R with its usual metric, let 5! be the unit circle as in the 
previous example, and let g : X > S! be given by q(x) = e'*. The domain X 
is a group, and the function q identifies S! set-theoretically as the quotient group 
R/2zZ, where Z is the subgroup of integers. This example illustrates the natural 


Equivalence relations and their connection with equivalence classes are discussed in Section A6 
of the appendix. 
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topology to impose on any quotient of a group when the group has a topology for 
which all translations are homeomorphisms. 


In many situations the problem of describing what sets are to be open sets in a 
topological space is simplified by the notion of a base for a topology. By a base 
B for the topology T on X is meant a subfamily of members of 7 such that every 
member of J is a union of sets in B. In Chapter II the topology for a metric space 
was really introduced by specifying that the family of all open balls is to be a 
base. Arguing as with Proposition 2.31, we obtain the following result. 


Proposition 10.1. A family 6 of subsets of a nonempty set X is a base for 
some topology J on X if and only if 


(a) X = Usze B and 
(b) whenever U and V are in B and x is in U MV, then there is a B in 6 such 
thatx isin BandBCUNYV. 


In this case the topology 7 is necessarily the set of all unions of members of 
B, and hence T is determined by B. A family 6 of subsets of X is a base for a 
particular given topology Jo on X if and only if (a) holds and 


(b’) foreach x € X and member U of J containing x, there is some member 
B of B such that x is in B and B is contained in U. 


REMARK. Condition (b) is satisfied if 6 is closed under finite intersections. 
Thus any family of subsets of X that is closed under finite intersections and has 
union X is a base for some topology on X. 


A topological space (X, J) is said to be separable if J has a base consisting 
of only countably many sets. A separable metric space has a countable base 
consisting entirely of open balls. 

As with metric spaces, there is a natural definition of subspaces for general 
topological spaces. If (X, T) is a topological space and if A is anonempty subset 
of X, then the relative topology for A is the family of all sets U M A with U in 
T. We can write TM A for this family. It is a simple matter to check that T/N A is 
indeed a topology for A, and we say that (A, J /M A) is a topological subspace 
of (X, 7). If there is no possibility of confusion and if the relative topology is 
understood, we may say that “A is a subspace of X.” 


3The definition of “topological group,” which is given in the companion volume Advanced Real 
Analysis, imposes further conditions beyond the fact that every translation is a homeomorphism. 

4Some authors use the word “separable” to mean that X has a countable dense set, but the 
meaning in the text here is becoming more and more common. The existence of a countable dense 
set is not a particularly useful property for a general topological space. 
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Proposition 10.2. If A and B are subspaces of a topological space X with 
BCA C X, then the relative topology of B considered as a subspace of X is 
identical to the relative topology of B considered as a subspace of A. 


PROOF. The relative topology of B considered as a subspace of X consists of 
all sets U M B with U open in X, and the relative topology of B considered as a 
subspace of A consists of all sets (U M A) B with U open in X. Thus the result 
follows from the identity (UN A)NONB=UN(ANB)=UNB. 


The next two propositions are proved in the same way as Proposition 2.26 and 
Corollary 2.27. 


Proposition 10.3. If A is a subspace of a topological space X, then the closed 
sets of A are all sets F A, where F is closed in X. Consequently B is closed 
in A if and only if B= B' 1 A. 


Proposition 10.4. If X and Y are topological spaces and f : X — Y is 
continuous at a point a of a subspace A of X, then the restriction f | Pic. ab @ 
is continuous at a. Also, f is continuous at a if and only if the function 
to: X — f(X) obtained by redefining the range to be the image is continuous 
at a. 


2. Properties of Topological Spaces 


Proposition 2.30 listed certain properties of metric spaces as “separation prop- 

erties.” These properties are not shared by all topological spaces, and instead 

we list them in this section as definitions. After giving the definitions, we shall 

examine implications among them and some roles that they play. The disproofs of 

certain implications provide an opportunity to introduce some further examples 

of topological spaces beyond those obtained from the constructions in Section 1. 
Let (X, T) be a topological space. We say that 


(i) X isa T, space if every one-point set in X is closed, 
(ii) X is Hausdorff if for any two distinct points x and y of X, there are 
disjoint open sets U and V withx « U andy € V, 
(iii) X is regular if for any point x € X and any closed set F C X with 
x ¢ F, there are disjoint open sets U and V withx € U and F CV, 
(iv) X is normal if for any two disjoint closed subsets E and F of X, there 
are disjoint open sets U and V such that E CU and F CV. 
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Proposition 2.30 listed one further property of an arbitrary metric space X ,namely 
that any two disjoint closed sets can be separated by a continuous function from 
X into [0, 1]. Urysohn’s Lemma in Section 7 will establish this property for any 
normal topological space. 


Proposition 10.5. If (X, 7) is a topological space, then 


(a) X is T, if and only if for any pair of distinct points x and y, there are 
open sets U and V suchthatx e U,y €U,x ¢€ V,andyeV, 

(b) X is regular if and only if for any point x and any closed set F withx ¢ F, 
there is an open set U such that x € Uand U“?N F =2, 

(c) X is normal if and only if for any pair of disjoint closed sets E and F, 
there is an open set U such that E C U and UN F =@. 


PRooF. If X is T; and if x and y are given, we can choose U = {y}* and 
V = {x}°. In the reverse direction, if x is given, choose, for each y # x, an open 
set V, such that x ¢ V, and y € V,; then {x}° = U, V, is open, and hence {x} 
is closed. 

If X is regular and if x and F are given, we can choose disjoint open sets U and 
V withx € U and F C V. Then the closed set V° has V° D> U and V“N F = @; 
therefore also V° D> U‘! and U''N F = @. In the reverse direction, suppose 
that x and F are given and that U is an open set with x € U and U0 F = @; 
choosing V = (U“)°, we see thatx € U, F CV ,and UNV =@. 

If X is normal and if E and F are given, we can choose disjoint open sets U and 
V with E C Uand F C V. Thenthe closed set V° has V6 D Uand VSN F = @; 
therefore also V° > U“' and U"'N F = @. In the reverse direction, suppose that 
E and F are given and that U is an open set with E C U and UU" F = @; 
choosing V = (U")°, we see that EC U,F CV,andUNV =@. 


Proposition 10.6. If (X, Z) is a topological space and 


(a) if X is T; and normal, then X is regular, 
(b) if X is T, and regular, then X is Hausdorff, 
(c) if X is Hausdorff, then X is T,. 


PROOF. In (a), if x and a disjoint closed set F are given, then {x} is closed, and 
the fact that X is normal implies that we can separate the closed sets {x} and F 
by disjoint open sets. In (b), if x and y are distinct points in X, then {y} is closed 
and the fact that X is regular implies that we can separate the point x and the 
disjoint closed set {y} by disjoint open sets. In (c), the fact that X is Hausdorff 
means that for any two distinct points x and y, there are disjoint open sets U and 
V with x € U and y € V. Then X satisfies the condition in Proposition 10.5a 
that was shown to be equivalent to the T; property. 
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EXAMPLES. 


(1) A space that is not T,, regular, or normal. Let X = {a,b,c}, and let 
THD. {a}, 1a.D}, 1G.Ck 14; 8, ey}: 


(2) A space that is T, but not Hausdorff. Let X be an infinite set, and let T 
consist of the empty set and all complements of finite sets. 


(3) A Hausdorff space that is not regular. Let X be the real line. A subset U 
of X is to be in T if for each point x of U, there is an open interval J, containing 
x such that every rational number in /, is in U. Then every open interval is in 
T, and hence X is certainly Hausdorff. On the other hand, the set of rationals is 
open in this topology, and therefore the set of irrationals is closed. The set of 
irrationals cannot be separated from the point 0 by disjoint open sets. 


(4) A Hausdorff regular space that is not normal. Let X be the closed upper 
half plane {Im z > 0} in C. A base for T consists of all open disks in X that do 
not meet the x axis, together with all open disks in X that are tangent to the x 
axis; the latter sets are to include the point of tangency. It is easy to see that X is 
Hausdorff, but a little argument is needed to see that X is regular. To begin with, 
every open set in the usual metric topology for X is in J, and hence every closed 
set in the usual metric topology for X is closed relative to J. Let p be a point in 
X,and let F be a T closed subset of X not containing p. There is no difficulty in 
separating p and F by disjoint open sets if p has y coordinate positive, and we 
therefore assume that p lies on the x axis. Since F is closed, Proposition 10.1 
produces a basic open set U tangent to the x axis at p such that UN F = ©. 
If D denotes a strictly smaller basic open set tangent to the x axis at p, then 
the only point of the ordinary boundary of U that lies in D‘' is p itself. Thus 
F 1 D" = @, and it follows that D and (D“)° are disjoint open sets separating 
p and F’. Consequently X is regular. We postpone the argument that X is not 
normal until Section 7, when Urysohn’s Lemma will be available. 


(5) A normal space that is not regular. Let X = {a, b}, and let 7 consist of 2, 
{a}, and {a, b}. 


We shall see in Section 5 that the Hausdorff property is exactly the right condi- 
tion to make limits be unique, hence to allow a reasonable notion of convergence. 
Also, in the construction of a quotient space, it is often a subtle matter to decide 
whether the quotient space is Hausdorff; we shall obtain sufficient conditions in 
Section 6. 

The property of regularity makes possible a generalization of the passage from 
a pseudometric space of points to a metric space of equivalence classes. The 
point of departure is the following proposition; we shall examine the resulting 
quotient space further in Section 6. 
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Proposition 10.7. Let X be a regular topological space. For points x and y in 
X, define x ~ y if x is in {y}°. Then ~ is an equivalence relation. 


PROOF. Certainly x lies in {x}*', and if x lies in {y}*! and y lies in {z}‘!, then 
x lies in {z}‘!. For the symmetry property, we argue by contradiction and use the 
regularity of X. Suppose that x lies in {y}*! but y does not lie in {x}*!. Regularity 
allows us to find disjoint open sets U and V such that y € U and {x}*! C V. 
Then the closed set V° contains y and hence also {y}*!. Since x lies in {y}*!, x 
lies in V°. But this relationship contradicts the fact that x lies in V. We conclude 
that ~ is symmetric and is therefore an equivalence relation. 


Subspaces of topological spaces inherit certain properties if the original space 
has them. Among these are T,, Hausdorff, and separable. A subspace of a 
normal space need not be normal, as is seen by taking X = {a, b,c, d}, and T= 
{S, {a}, {a, b}, {a,c}, {a, b, c, d}}, the subspace being {a, b, c} and the relatively 
closed subsets of interest being {b} and {c}. Let us state the result for regularity 
as a proposition. 


Proposition 10.8. A subspace of a regular topological space is regular. 


PROOF. Within a subspace A of X, let F be a relatively closed set, and let x 
be a point of A not in F. By Proposition 10.3 we have F = F°'1/ A, the closure 
being taken in X. Since x is in A but not F, x is not in F“. Since X is regular, 
we can find disjoint open sets U and V in X with x € U and F“ C V. Then 
U1 Aand V 7 Aare disjoint relatively open sets containing x and F. 


As with metric spaces, a subset D of a topological space X is dense in A if 
D‘ D A; D is dense if D is dense in X. A set D is dense if and only if there 
is some point of D in each nonempty open set of X. If X is separable, then X 
has a countable dense set; we have only to select one point from each nonempty 
member of the base. 

The properties of bases of a topological space X become more transparent 
with the aid of the notion of a local base. A set /,. of open neighborhoods of x is 
a local base at x if each open set containing x contains some member of U/,. If B 
is a base, then the members of 6 containing x form a local base at x. Conversely 
if U/, is a local base for each x, then the union of all the U/,’s is a base. We say that 
X has a countable local base at each point? if a countable such U/, can be chosen 
for each x in X. Metric spaces have this property; the open balls of rational radii 
centered at a point form a local base at the point. 


5Some authors say instead that “X satisfies the first axiom of countability” or “X is first countable” 
if this condition holds. In the same kind of terminology, one says that “X satisfies the second axiom 
of countability” or “X is second countable” if X is separable in the sense of Section 1. 
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EXAMPLE 4, CONTINUED. A space that has a countable dense set and has a 
countable local base at each point and yet is not separable. As in Example 4 
earlier in this section, let X be the closed upper half plane {Imz > 0} inC. A 
base for T consists of all open disks in X that do not meet the x axis, together 
with all open disks in X that are tangent to the x axis; the latter sets are to include 
the point of tangency. For a point p on the x axis, the open disks of rational 
radii with point of tangency p form a countable local base, and for a point p 
off the x axis, the open disks within the open upper half plane having center p 
and rational radius form a countable local base. A countable dense set consists 
of all points with rational coordinates and with y coordinate positive. We shall 
see in Corollary 10.10 in the next section that a separable regular space has to be 
normal, and this X is not normal, according to the statement in Example 4 and 
the proof to be given in Section 7. Thus X cannot be separable. 


3. Compactness and Local Compactness 


Let X be a topological space. In this section we carry over to a general topological 
space X some definitions made in Section II.7 for metric spaces. A collection U/ 
of open sets is an open cover of X if its union is X. An open subcover of 2/is a 
subset of 2/ that is itself an open cover. 

We begin with a new term, saying that the topological space X is a Lindeléf 
space if every open cover of X has a countable subcover. Proposition 2.32 showed 
that a metric space X is separable if and only if X is a Lindelof space. For general 
topological spaces it is still true that any separable X is a Lindeldf space, by the 
same argument as for the implication that condition (a) implies condition (b) in 
Proposition 2.32. In fact, every subspace of a separable space is separable, and 
hence every subspace of a separable space is Lindel6f. However, a Lindel6f space 
need not be separable, as the following example shows rather emphatically. 


EXAMPLE. We construct a topological space (X, J) that is Hausdorff and 
normal, has a countable dense set, has a countable local base at each point, is 
Lindel6f, yet is not separable. Take X as a set to be the real line. The intersection 
of any two bounded intervals of the form [a, b) is an interval of the same kind, 
and the union of all such intervals is the whole line. Hence the bounded intervals 
[a, b) form a base for some topology on the line, and this topology we take to 
be 7. It is called the half-open interval topology for the real line. Since every 
ordinary open interval of the line is the union of intervals [a, b), any open set in 
the usual metric topology is open in the half-open interval topology. Any two 
distinct points of X may be separated by ordinary disjoint open intervals, and 
therefore X is Hausdorff. To see that X is regular, let a point x and a closed set 
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F with x not in F be given. Since x is in the open set F'°, some [x, x + €) is 
disjoint from F. Then U = [x,x + €) and V = (—ov, x) U [x + €, +00) are 
disjoint open sets separating x and F, and we conclude that X is regular. Once 
we prove that X is Lindelof, it will follow from Proposition 10.9 below that X 
is normal. The rationals form a countable dense subset of X, and the set of all 
intervals Es x + 1) is a countable local base at x. The space X is not separable. 
In fact, if B is any base, we can find, for each x, some open neighborhood B, of 
x that is in B and is contained in [x, x + 1). Ifx < y, then x cannot lie in By and 
hence B, 4 By; therefore B has to be uncountable. Finally let us see that X is 
Lindelof. Let an open cover U of X be given, and fix a negative real number xo. 
Consider the set S(xo) of all real numbers x such that some countable collection 
of members of U/ covers [xp, x]. Since x9 is covered by some member of U/, the 
set S(xo) contains xo. If the set contains an element x;, then the member of the 
countable collection that covers x; must contain [x;, x; + €) for some € > 0. 
Thus x; + A is in S(xo), and S(xo) contains no largest element. We shall show 
that S(xo) = [xo, +00). If the contrary is true, then S(xo) must be bounded. In 
this case, let c be the least upper bound. For large enough n, c — 7 is in S(xo). 
Taking the union of the countable collections that cover [x0, om 1] , together with 
one more set to cover c, we obtain a countable collection that covers [xo, c], and 
we see that c is in S(xp). Thus c is in S(xo), and we have a contradiction to the 
fact that S(xo) contains no largest element. We conclude that some countable 
subcollection of U/ covers [xo, +00), no matter what xo is. Taking the union of 
the countable subcollections corresponding to each negative integer, we obtain a 
countable subcollection of (/ covering (—oo, +00). Thus X is Lindelof. 


It is not always so obvious when a topological space is normal. The next result 
provides one sufficient condition. 


Proposition 10.9 (Tychonoff’s Lemma). Every regular Lindeléf space is 
normal. 


PROOF. Let X be regular and Lindel6f, and let disjoint closed subsets E and F 
of X be given. By regularity and Proposition 10.5b each point of E has an open 
neighborhood whose closure is disjoint from F. Therefore the class U/ of open 
sets with closures disjoint from F covers E. Similarly the class V of open sets 
with closures disjoint from E covers F. Thus 4/U VU {X — (EU F)} is an open 
cover of X. Since X is Lindel6f, there exist sequences of sets U,, in U and V,, in 
Vsuch that E C J, U, and F CU, V,,. Put 


U,=U,-(Jvfi and Vi =v, -[Jup. 


k<n k<n 


3. Compactness and Local Compactness 453 


When m < n, we have V,, C ee Ves Then U/ N V,, = ©, and hence the 
smaller set U1 V,, is empty. Reversing the roles of the U’s and the V’s shows 
that U; OV) isempty form > n. Therefore U/ OV, = © for all n and m. Define 


oe) oe) 
NE, and v=U", 


Then UNV = Um (CU, Vi.) = 2. Also, 


(oe) (oe) (oe) (oe) 
ENU = en (™-U ve") 5 ent (™-U ve") = en(x-U) ve'), 


the last equality holding since {U,,} covers E. The right side here equals E since 
Vel Cc X — E for all k, and therefore E C U. Similarly F C V. The proof is 
complete. 


Corollary 10.10. Every regular separable space is normal. 


PROOF. A separable space is automatically Lindelof, and thus the corollary 
follows from Proposition 10.9. 


Let us return to the concluding example in Section 2, in which X as a set is 
the closed upper half plane {Im z > 0} but in which the topology is nonstandard 
near the real axis. It was shown in Section 2 that this particular X is regular, and 
it was stated that Urysohn’s Lemma would be used in Section 7 to show that X 
is not normal. By Corollary 10.10, X cannot be separable. This completes the 
argument that X has a countable dense set and has a countable local base at each 
point yet is not separable. 

We can now proceed with carrying over some definitions from Section IL.7, 
valid there for metric spaces, to a general topological space X. We call X compact 
if every open cover of X has a finite subcover. A subset E of X is compact if it is 
compact as a subspace of X, i.e., if every collection of open sets in X whose union 
contains E has a finite subcollection whose union contains F. It is immediate 
from the definition that the union of two compact subsets is compact. 

This definition generalizes the property of closed bounded sets of R” given 
by the Heine—Borel Theorem. We shall see that the Heine—Borel property, rather 
than the Bolzano—Weierstrass property for sequences, is the useful property to 
carry over to more general situations in real analysis. In fact, in several places in 
this book, we have combined an iterated application of the Bolzano—Weierstrass 
property with the Cantor diagonal process to obtain some conclusion. This 
construction is tantamount to proving that the product of countably many compact 
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metric spaces, which is a metric space essentially by Proposition 10.28 below, is 
compact. There will be situations for which we want to consider an uncountable 
product of compact metric spaces, and then arguments with sequences are not 
decisive. Instead, it is the Heine—Borel property that is relevant. The Tychonoff 
Product Theorem of Section 4 will be the substitute for the Cantor diagonal 
process, and the use of nets, considered in Section 5, will be analogous to the use 
of sequences. 

A number of the simpler results in Section II.7 generalize easily from compact 
metric spaces to all compact topological spaces or at least to all compact Hausdorff 
spaces. We list those now. A consequence of Proposition 10.12 below is that 
compactness is preserved under homeomorphisms. 

A set of subsets of a nonempty set is said to have the finite-intersection 
property if each intersection of finitely many of the subsets is nonempty. 


Proposition 10.11. A topological space X is compact if and only if each 
set of closed subsets of X with the finite-intersection property has nonempty 
intersection. 


PROOF. Closed sets with the finite-intersection property have complements 
that are open sets, no finite subcollection of which is an open cover. 


Proposition 10.12. Let X and Y be topological spaces with X compact. If 
f : X — Y is continuous, then f(X) is a compact subset of Y. 


PROOF. If {U,} is an open cover of f(X), then {f7!(Uy)} is an open cover of 
X. Let {f-1(U Dyin be a finite subcover. Then {U ijt is a finite subcover of 
f(X). | 


Corollary 10.13. Let X be a compact topological space, and let f : X — R 
be a continuous function. Then f attains its maximum and minimum values. 


PROOF. By Proposition 10.12, f(X) is a compact subset of R. Arguing as in 
the proof of Corollary 2.39, we see that f(X) has a finite supremum and a finite 
infimum and that both of these must lie in f(X). 


Proposition 10.14. A closed subset of a compact topological space is compact. 


PROOF. Let E be aclosed subset of the compact space X, and let / be an open 
cover of E. Then U/ U {E°} is an open cover of X. Passing to a finite subcover 
and discarding E°, we obtain a finite subcover of E. Thus EF is compact. 


Lemma 10.15. Let K and E be subsets of a topological space X, and let K be 
compact. Suppose that to each point x of K there are disjoint open sets U, and 
V, such that x is in U, and E C V,. Then there exist disjoint open sets U and V 
such that K CU andE CV. 
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PROOF. As x varies through K , the open sets U, form an open cover of K. By 
compactness, a finite subcollection of the U,.’s is a cover, say U,,,..., U;,,. Put 
U = Ure, Uy, and V = (\y_, Vx,. Then K C Uand E CV. Also, UNV = 


(Uiea Un JOC eat Vu) = reat (Gy, We Vx.) Sc Uia1 Un a Vi.) =, 
and thus U and V have the required properties. 


Proposition 10.16. Every compact Hausdorff space is regular and normal. 


PROOF. Let X be compact Hausdorff. Ifa point x anda closed set F withx ¢ F 
are given, we observe by Proposition 10.14 that F is compact. The Hausdorff 
property of X allows us to take EF = {x} and K = F in Lemma 10.15, and we 
obtain disjoint open sets U and V such that x is in V and F C U. Thus X is 
regular. 

If disjoint closed sets E and F are given, then F is compact by Proposition 
10.14. The fact that X has been shown to be regular allows us to take K = F in 
Lemma 10.15, and we obtain disjoint open sets U and V such that EF C V and 
F CU. Thus X is normal. 


Proposition 10.17. In a Hausdorff space every compact set is closed. 


PROOF. Let X be a Hausdorff space, and let K be acompact subset of X. Fix x 
in K°. The Hausdorff property of X allows us to take E = {x} in Lemma 10.15, 
and we obtain disjoint open sets U, and V,. such that x is in V, and K C U,. 
Letting x now vary, we see that K° = J V,. Hence K* is open and K is 
closed. 


xEKS 


Corollary 10.18. Let X and Y be topological spaces with X compact and 
with Y Hausdorff. If f : X — Y is continuous, one-one, and onto, then f is a 
homeomorphism. 


PROOF. We are to show that f~! : Y — X is continuous. Let E be a closed 
subset of X, and consider (f~!)~!(E) = f(E). The set E is compact in X by 
Proposition 10.14, f(£) is compact by Proposition 10.12, and f (£) is closed by 
Proposition 10.17. Since the inverse image under f—! of any closed set is closed, 
f—! is continuous. 


A topological space is locally compact if every point has a compact neigh- 
borhood. Compact spaces are locally compact, but the real line with its usual 
topology is locally compact and not compact. In a sense to be made precise in 
the next two propositions, locally compact Hausdorff spaces are just one point 
away from being compact Hausdorff. 

Let (X, T) be an arbitrary topological space. Define a new set X* by X* = 
X U {oo}, where oo is not already a member of X, and define T* to be the union 
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of J and the set of all complements in X* of closed compact subsets of X. We 
shall verify in Proposition 10.19 that J* is a topology for X*. The topological 
space (X*, T*) is called the one-point compactification of (X, 7). By way of 
examples, the one-point compactification of R may be visualized as a circle and 
the one-point compactification of R? may be visualized as a sphere. 


Proposition 10.19. If (X, 7) is a topological space, then (X*, 7*) is a 
compact topological space, X is an open subset of X*, and the relative topology 
for X in X* is T. 


PROOF. To see that T* is a topology, we observe first that @ and X* are in T*. 
If U and V are in 7%, there are three cases in checking that UM V is in T*: If U 
and V are both in J, then UNV isin T since T is closed under finite intersections. 
If U is in T and V is not, then V° is closed compact in X, and X — V° is thus 
open in X; since T is closed under finite intersections, UN V = UN (X — V°) is 
in J. If U and V are not in T, then the complements U° and V° in X* are closed 
compact subsets of X; so is their union (UN V)°, and hence UN V is in 7*. 

We still have to check closure of T* under arbitrary unions. Suppose that Uy, 
is in J for w in an index set A and Vg has closed compact complement for B 
in an index set B. Then L),<4 Uo is in T, and if B is nonempty, (|g. Vz is a 
closed subset of one Vj and hence is compact; in this case, ( Uses Vp) is closed 
compact in X, and hence |) gcp Vp isin J*. Thus we have only to check that 
U UV isin T* if U isin T and V“ is closed compact in X. As the intersection of 
two closed sets, one of which is compact, (X — U) NV° = (X —U)N(X — V) 
is closed and compact in X, and thus U UV = ((X — U)NV°)* is in J*. Thus 
T* is a topology. 

To see that X* is compact, let 2/ be an open cover of X*. Find some V in U/ 
containing the point oo. The members of //M T cover the compact subset V° of 
X, and there is a finite subcollection V that covers V°. Then VU {V} is a finite 
subcollection of U/ that covers X*. 

The set X is in J and is therefore in J*. Thus X is open in X*. To complete 
the proof, we are to show that T7* M X = 7. We know that T* 1 X D T. If V is 
a member of 7* that does not lie in 7, then V‘° is closed compact in X, and its 
complement X — V° = VM X in X is openin X. Hence VN X is in T. 


Proposition 10.20. If X* is the one-point compactification of a topological 
space X, then X* is Hausdorff if and only if X is both locally compact and 
Hausdorff. 


PROOF. Suppose that X is locally compact and Hausdorff. Since X is Haus- 
dorff, any two points of X can be separated by disjoint open sets in X, and these 
sets will be open in X*. To separate a point x in X from oo, let C be a compact 
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neighborhood of x in X. Since X is Hausdorff, C is closed in X. Thus C° is in 
T*. Then C° and C° are disjoint open sets in X* such that x is in C°? and oo is in 
C°, and X* is Hausdorff. 

Conversely suppose that X* is Hausdorff. Proposition 10.19 shows that X is 
a subspace of X*. Since any subspace of a Hausdorff space is Hausdorff, X is 
Hausdorff. To see that X is locally compact, let x be in X, and find disjoint open 
sets U and V in X* such that x is in U and oo is in V. Then U must be in 7, and 
V° must be closed compact in X. Since UN V = @,U C V*. This inclusion 
exhibits V° as a compact neighborhood of x, and thus X is locally compact. 


Corollary 10.21. Every locally compact Hausdorff space is regular. 


PRooF. If X is locally compact Hausdorff, Propositions 10.19 and 10.20 show 
that the one-point compactification X* is compact Hausdorff and allow us to 
regard X as a subspace of X*. Proposition 10.16 shows that X* is regular, and 
Proposition 10.8 shows that X is therefore regular. 


A locally compact Hausdorff space need not be normal; an example is given 
in Problem 5 at the end of the chapter. The remainder of this section concerns 
senses in which a locally compact Hausdorff space is almost normal. 


Corollary 10.22. If K and F are disjoint closed sets in a locally compact 
Hausdorff space and if K is compact, then there exist disjoint open sets U and V 
such that K CU andF CV. 


PROOF. This is immediate from Lemma 10.15 and Corollary 10.21. 


Corollary 10.23. If K is a compact set in a locally compact Hausdorff space, 
then there is a compact set L such that K C L°. 


PROOF. Let X be locally compact Hausdorff, and form the one-point compact- 
ification X*. Since X* is compact Hausdorff by Proposition 10.20, Proposition 
10.17 shows that K is closed in X* and Proposition 10.16 shows that X* is regular. 
Thus Proposition 10.5b shows that we can find an open set U in X* such that oo 
isin U and UN K = @. Then K C X*—U"C X*-U. By definition of 
the topology of X*, the set L = X* — U is compact in X. Its subset X¥* — U“ is 
open and is therefore contained in L°. Thus K C L° C L with L compact. 


A topological space is called o-compact if there is a sequence of compact sets 
with union the whole space. The real line with its usual topology is o-compact. 
For that matter, so is the subspace of rationals since each finite subset is compact. 
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Proposition 10.24. A locally compact topological space is o-compact if and 
only if it is Lindelo6f. Consequently every o-compact locally compact Hausdorff 
space is normal. 


PROOF. If X is o-compact, write X = Wess K, with K, compact. If 2/is an 
open cover of X, then U/is an open cover of each K,,, and there is a finite subcover 
U,, of K,. Then eae U,, is a countable subcover of 2/, and X is Lindelof. 

Conversely if X is locally compact and Lindelof, choose, for each x in X,a 
compact neighborhood K, of x, and let U, be the interior of K,. As x varies, the 
U,. form an open cover of X. Since X is Lindel6f, there is a countable subcover 
{U,, }°°.,. Since we have U,, C K,, forall n, {K,, }°° , is a sequence of compact 
sets with union X. Hence X is o-compact. 

Finally if X is locally compact Hausdorff and o-compact, hence also Lin- 
delof, then Corollary 10.21 shows that X is regular, and Tychonoff’s Lemma 
(Proposition 10.9) shows that X is normal. 


Proposition 10.25. In a o-compact locally compact Hausdorff space, there 
exists an increasing sequence {K,} of compact sets with union the whole space 


and with K, C Kray for all n. 


PROOF. Let X be a locally compact Hausdorff space such that X = UJ, Ln 
with L, compact. Replacing L,, by the union of the previous members of the 
sequence, we may assume that L, C L,+, foralln > 1. Put Lop = Ko = ©. Use 
Corollary 10.23 to choose K; compact with L; C K?. 

Inductively suppose that n > 0 and that for all k with 0 < k <n, a compact 
set K;, has been defined such that Ly; U Ky_1 C K?. Applying Corollary 10.23, 
we can find a compact set K,+1 such that the compact set L+41 U Ky is contained 
in Koa Then Kx_; C Kp for all k > 1 as required, and X = (Ba Ky, since 
KeC.Egand JP tg ex: 


4. Product Spaces and the Tychonoff Product Theorem 


The product topology for the product of topological spaces was discussed briefly 
in Section 1. If S is a nonempty set and if X, is a topological space for each s in 
S, then the Cartesian product X = XK, esXs, as a Set, is the set of all functions f 
from S into U,-5 Xs such that f(s) is in X, for all s € S. The topology that is 
imposed on X is, by definition, the weakest topology that makes the s™ coordinate 
function p, : X — X; be continuous for every s. 

Let us investigate what sets have to be open in this topology, and then we can 
look at examples and see better what the topology is. If U; is any open subset 
of X,, then p.W,) has to be open in X since ps is continuous. For example, 
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if S = {1,2}, we are considering X = X, x X>. A set py (U1) is of the form 
U,; x X2, and a set p> | (U2) is of the form X; x U2. These have to be open if 
U, is open in X; and U> is open in X2. The intersection of any two such sets, 
which is of the form U; x U2, has to be open in X, as well. We do not need to 
intersect these sets further, since p, |i) a p, (1) = py i V1). By the 
remark with Proposition 10.1, the sets py U1) a py (U2) with U; open in X 
and U2 open in X2 form a base for some topology on X = X, x X . These sets 
have to be open in the product topology, and p, and p2 are indeed continuous in 
this topology. Therefore the product topology on X = X; x X> has 


{p,'(U1) N py" (U2) | Uy open in X1, Uz open in X>} 
as a base. More generally the product topology on X = X; x --- x X, has 


| () py (Uk) | U; open in X, for each K| 
k=1 


as a base. 

When the index set S is the set of positive integers, the product X = X ,, esXn> 
as a set, is the set of sequences { f (7) }nes. Again any set ate (U,,) with U, open 
in X, has to be open in X. Hence any finite intersection of such sets as n varies 
has to be open. But there is no need for infinite intersections of such sets to be 
open, and a base for the product topology in fact consists of all finite intersections 
of sets Da (Gh) with U,, open in X,y. 

The use of finite intersections, and not infinite intersections, persists for all S$ 
and gives us a description of a base for the product topology in general. When 
S = [0, 1] and all X, are [0, 1], the description of the product topology has a 
helpful geometric interpretation. The set X consists of all functions from the 
closed unit interval to itself, and we can visualize these in terms of their graphs. 
A basic open set of such functions imposes restrictions at finitely many values of 
S,i.e., at finitely many points of the domain. At such values of s, the graph of a 
function in the basic open set is to pass through a certain window U, depending 
ons. At all other values of s, the function is unrestricted. 


Proposition 10.26. The topological product of Hausdorff topological spaces 
is Hausdorff. 


PROOF. Let a product X = X,. Xn be given, let p, : X > Xs be the 
coordinate function, and let two distinct members f and g of X be given. 
Members of X are functions of a certain kind, and these two functions, being 
distinct, have f(s) 4 g(s) forsome s € S. Since X, is Hausdorff, we can choose 
disjoint open sets U; and V, in Xs such that f(s) is in Us and g(s) is in V,. Then 
p,'(Us) and p,! (V;) are disjoint open sets in X such that f is in p, | (Us) and g 
is in py! (Vs). 


gth 
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Theorem 10.27 (Tychonoff Product Theorem). The topological product of 
compact topological spaces is compact. 


REMARKS. This theorem is a fundamental tool in real analysis. We shall give 
the proof and then discuss how the theorem can be regarded as a generalization of 
the Cantor diagonal process used in the proofs earlier of the fact that any totally 
bounded complete metric space is compact (Theorem 2.46), the Helly Selection 
Principle (Problem 10 at the end of Chapter I), Ascoli’s Theorem (Theorems 
1.22 and 2.56), and, by implication, the Cauchy—Peano Existence Theorem for 
differential equations (Problems 24-29 at the end of Chapter IV). The proof 
will make use of Zorn’s Lemma (Section A9 of the appendix), which is one 
formulation of the Axiom of Choice. Actually, the Axiom of Choice arises in two 
more transparent ways in the proof as well. One is simply in the statement that 
the topological product is a topological space; for this to be the case, the product 
has to be nonempty, and that is the content of the Axiom of Choice. The other 
is the construction of a particular element x in the product that occurs near the 
beginning of the proof below. 


PROOF. LetX = X , esXs be given witheach X, compact, and let ps : X > Xs 
be the s™ coordinate function. We are to prove that any open cover of X has a 
finite subcover, and we begin by proving a special case. Let S be the family of all 
sets p, )(Us) as Us varies through all open sets of X; and as s varies. We know 
that finite intersections of members of S form a base for the product topology on 
X. For the special case let / be an open cover of X by members of S; we shall 
produce a finite subcover. For each s, let B, be the family of all open sets U, in 
X; such that p> '(U,) is inU. We may assume for each s that no finite subfamily 
of B, covers X, since otherwise the corresponding finitely many sets p,'(Us) 
would cover X. By compactness of X,, B, does not cover X,; say that x, is not 
covered. The point x of X whose s" coordinate is x, then belongs to no member 
of U/, and U cannot be a cover. This contradiction shows that the special U/ has a 
finite subcover. 

Now let U/ be any open cover of X, and suppose that no finite subfamily of U/ 
covers X. Let C be the system of all open covers V of X such that 2/ C Vand such 
that no finite subfamily of V covers X. The set C is partially ordered by inclusion 
upward and is nonempty, having U/ as a member. If {V,,} is a chain in C, then we 
shall show that V = ), Vz is in C and hence is an upper bound in C for the chain 
{V,}. In fact, V is certainly an open cover. If it has a finite subcover, then each 
member of the finite subcover lies in one of the covers, say Vy,. Since {Vg} is a 
chain, all members of the finite subcover lie in the largest of those Vy,’s. Thus 
one of the Va ’s fails to be in C, and we arrive at a contradiction. We conclude 
that every chain in C has an upper bound in C. By Zorn’s Lemma let U/* be a 
maximal cover from C of X. 
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The family SM U* of all members of U/* that are in the family S of the first 
paragraph of the proof has the property that no finite subfamily is a cover of X. 
By the result of the first paragraph, SM U/* cannot be a cover of X. Hence we 
shall have arrived at a contradiction if we show that the union of the members 
of U/* is contained in the union of the members of SM U*. Let U be a member 
of U/*, and fix a point x in U. Since finite intersections of members of S form a 
base, Proposition 10.1 shows that there are members S; 1---M S, of S such that 
x isin $;N---NS, and $;N---AS, CU. We shall show that one of the sets 
S; isin U*, hence inU/* MS, and then the proof will be complete. 

If S; is in /*, we are finished. Otherwise, by the maximality of U/*, there are 
finitely many open sets C;,...,C, of U* such that X = S$; UC; U---UC,. 
Again by the maximality, no open set containing S$; can belong to U/*, since the 
union of that set with C; U- --UC; would be X. Proceeding inductively, suppose 
we have shown that no open set containing S$; M---M S; is in U/* and that there 
are open sets D,,..., Dn inUd* with 


YAO As AS) OO Uap; 


If, as we may assume, S;+1 is not in//*, then by maximality of U/*, there are open 
sets E,,..., E, inU¢* such that X = S;4; UE; U---U E,. Then 


X—Si41 GE, U---UE,, 


and Sia = (S10 +++ Si41) U Sig M (D1 U-+-U Dn) 
CUS NAS WD, Ue Dy), 


Hence 
X = Si41 U(X — Sigs) © ((S1.N- + -Si41) UU (D1 U-+-U Dn))U (EU: -UE;). 
That is, 

X = (S, N+ +--+ Si41) U (D) U- ++ U Dy, U Ey U-+- UV E,). 


Therefore, once again by maximality of /*, no open set containing S$, M---Si+1 
can be in U/*, and the induction is complete. In particular, U, which is an open 
set containing S$; ---S,,is notin U/*. This contradiction concludes the proof. 


As announced above, the Tychonoff Product Theorem is a generalization of 
the Cantor diagonal process. In fact, let us see how that diagonal process may be 
used to show directly that the product of a sequence of copies of [0, 1] is compact. 
Denote the product as a set by X = XK ie [0, 1]. A member of X is a sequence 
{x,} with terms x,. Let us impose on X the Hilbert-cube metric of Example 11 
in Section IT.1: 
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A({Xn}, (Yn}) = psa — Ynl- 


We show below in Corollary 10.29 that this metric on X yields the product 
topology. By Theorem 2.36 the space X will then be compact if every sequence 
in X has a convergent subsequence. A sequence in X means a system {x} in 
which the n" term of the m™ sequence is x, Convergence is term-by-term 
convergence. To produce a convergent subsequence of sequences, we iterate use 
of the Bolzano—Weierstrass property of [0, 1]. Remembering that m tells which 
sequence we are dealing with, we find first a subcollection m, of the indices m 
such that we have convergence along the m;’s for n = 1, then a subcollection 
my, of that such that we have convergence along the m;,’s for n = 2, and so on. 
Since the intersection of all these sequences may be empty, we instead obtain 
a convergent subsequence of our sequences by requiring that the k term of 
the desired subsequence be the k"" term of the k" subsequence. This “diagonal 
process” thus shows that any sequence in X has aconvergent subsequence. Hence 
X, being a metric space, is compact. 

The general Tychonoff Product Theorem may thus be viewed as a topological 
generalization of the diagonal process to product spaces with an uncountable 
number of factors. 


Here is one way in which the Tychonoff Product Theorem is used in real 
analysis. For the situation in which we have a set Y and a system of functions 
fs : Y — C for s in some set S, the first section of this chapter introduced 
the weak topology for Y determined by { f;};<s. This is the weakest topology 
making all the functions f, continuous. Often in analysis a set Y and a system 
of functions jf; of this kind arise in a construction, and then this weak topology 
is imposed on Y. In favorable cases it turns out that each function f, is bounded 
on Y. In this case if there are enough functions f; to separate points of Y 
(i.e., enough so that for each x and y there is some s with f,(x) 4 f;()), 
then Y is a candidate for a compact Hausdorff space. To see what is needed for 
compactness, let X, be a compact subset of C containing the image of f,, and let 
X = X ,esXs- Define a function F : ¥ > X by “F(y) is the function whose so 
coordinate is f;(y).” It is readily verified that F is a homeomorphism of Y onto 
a subspace of the compact Hausdorff space X. Thus Y is compact if and only if 
F(Y) is closed in X. Checking that a set is closed is much easier than checking 
compactness directly, and it is especially easy if one uses “nets,” which are the 
objects introduced in the next section as a useful generalization of sequences. 

To complete our discussion, we still need to prove that the Hilbert-cube metric 
on X = X oa , LO, 1] yields the product topology. It will be helpful to prove the 
following more general result and to obtain the statement about the Hilbert cube 
as a special case. 
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Proposition 10.28. Suppose that X is anonempty set and {d,,},>1 is a Sequence 
of pseudometrics on X such that d,(x, y) < 1 for all n and for all x and y in X. 
Then d(x, y) = )(>°., 2-"d, (x, y) is a pseudometric. If the open balls relative 
to d, are denoted by B,(r; x) and the open balls relative to d are denoted by 
Bir; x), then the B,,’s and B’s are related as follows: 


(a) whenever some B,, (7,3; x) is given withr, > 0, there exists some B(r; x) 
withr > 0 such that B(r; x) C B,(rp3 x), 

(b) whenever B(r; x) is given with r > 0, there exist finitely many r, > 0, 
say for n < K, such that ls Bryn; x) © Br; x). 


PROOF. For (a), choose r = 2-"r,. If d(x, y) < r, then 27d» (x, y) < r for 
all m and in particular d,(x, y) < 2"r =Prp. 

For (b), choose K large enough so that 2~* < r/2, and put r, = r/2 for 
n < K. If y is in (\K, Bn(tn; x), then d,(x,y) < ry = r/2 forn < K. 
Henced sy) Yr "d, yy) Pe Se ee Be 
r/2+r/2 =r. Therefore y isin B(r; x). 


Corollary 10.29. The Hilbert-cube metric on X = X = [0, 1] yields the 
product topology. 


PROOF. Proposition 10.28a implies that any basic open neighborhood of x 
in the product topology contains a basic open neighborhood in the Hilbert-cube 
metric topology. Proposition 10.28b shows that any basic open neighborhood of 
x in the Hilbert-cube metric topology contains a basic open neighborhood in the 
product topology. 


5. Sequences and Nets 


Sequences are of limited interest in general topological spaces. Nets, which are 
generalized sequences of a certain kind, are a useful substitute, and we introduce 
them in this section. Using nets, we shall be able to see that product topologies are 
appropriate for detecting pointwise convergence in the same way that the metric 
topology obtained from the supremum norm is appropriate for detecting uniform 
convergence. 

We begin with two examples that illustrate some of the difficulties with using 
sequences in general topological spaces. We use the natural definition suggested 
by Section II.4—that a sequence {x,} in X converges to xo if for each neigh- 
borhood of xo, there is some N depending on the neighborhood such that x, is 
in the neighborhood for n > N. We say that the sequence is eventually in the 
neighborhood. The point xo is a limit of the sequence. 
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EXAMPLES. 


(1) Let X be the set of positive integers, and let a topology for X consist of 
the empty set and all sets whose complements are finite. If x, = 2n, then the 
sequence {x,} converges to every point of X and hence does not have a unique 
limit. The space X is T, and has a countable local base at each point, but X is 
not Hausdorff. 


(2) Let X be the set of points (m, n) in the plane with m and n integers > 0. 
Define a topology for X as follows. Any set not containing (0, 0) is to be open. If 
a set U contains (0, 0), then U is defined to be open if there are only finitely many 
columns C,, = {(m,n) | n = 0,1,2...} such that C,, — (UM C,,) is infinite. 
Enumerate X, and define x, to be the n" point in the enumeration. It is easy to 
check that the image of the sequence {x,} has (0, 0) as a limit point and that no 
subsequence of {x,} converges to (0,0). The space X is Hausdorff but does not 
have a countable local base at (0, 0). 


Thus the elementary results in Section II.4 do not generalize to all topological 
spaces. But Proposition 2.20 (the uniqueness of the limit of any sequence) 
is still valid if X is Hausdorff, and Proposition 2.22 and Corollary 2.23 (the 
characterization of limit points and of closed sets in terms of sequences) are still 
valid if X has a countable local base at each point. Nets will cure the problem 
about characterizing limit points and closed sets without countable local bases 
but not the problem about nonuniqueness of limits, and thus we shall be able to 
work well with nets in all Hausdorff spaces. In particular we shall be able to use 
nets in uncountable products of Hausdorff spaces, which arise frequently in real 
analysis and tend not to have a countable local base at each point. 

Before defining nets, let us give one positive result whose statement mixes 
topological spaces and metric spaces. If S is any nonempty set, we have made 
B(S), the vector space of all bounded scalar-valued functions on S$, into a normed 
linear space—and hence a metric space—by means of the supremum norm. If 
S is a topological space, let C(S) be the subset of continuous members of B(S); 
this is a vector subspace and hence is itself a normed linear space. 


Proposition 10.30. If S is a topological space and { f,,} is a sequence of scalar- 
valued functions continuous at so and converging uniformly to a function f , then 
f is continuous at x9. Consequently the subspace C(S) of B(S) is a closed 
subspace, and C(S) is complete as a metric space. 


PROOF. Given € > 0, choose N such that n > N implies || f, — F lbeae <€. 
For any s, we then have 
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If (s) — f(so)l < | f(s) — f(s) + lf (s) — fiv (50) + fv (50) — f (0)1 
< If — fllsup + lfn 9) — fv 0)1 + Ifiv — f llsup 
< 2e€ + |fu(s) — fv(so)|- 


Since fy is continuous at so, there exists a neighborhood of so such that the right 
side is < 3¢ for s in that neighborhood. Thus f is continuous at so. 

If { fn} is a sequence in C(S) converging uniformly to f in B(S), then f is in 
C(S), by the result of the previous paragraph. Since convergence of sequences 
in B(S) is the same as uniform convergence, Corollary 2.23 shows that C(S) 
is a closed subset of B(S). Propositions 2.43 and 2.44 then show that C(S) is 
complete as a metric space. 


Now we turn our attention to nets. In the indexing for a net, the set of positive 
integers is replaced by a “directed set;’ which we define first. Let D be a partially 
ordered set in the sense of Section A9 of the appendix, the partial ordering being 
denoted by <. We say that (D, <) is a directed set if for any a and £ in D, there 
is some y in D witha < y and B < y. 


EXAMPLES. 
(1) Take D to be the set of positive integers, and let < have the usual meaning. 


(2) Let S be anonempty set, take D to be the set of all finite subsets of S, and 
let w < 6 mean that the inclusion a C 6 holds. 


(3) Let X be a topological space, let x be a point in X, take D to be the set of 
all neighborhoods of x, and let a < 6 mean that a D fp. 


(4) Let (D;, <1) and (Do, <2) be two directed sets, take D to be D, x D2, and 
let (@1, @2) < (61, 62) mean that a <; B; and a <> py. 


If X is a nonempty set, a net in X is a function from a directed set D into X. 
If D needs to be specified to avoid confusion, we speak of a “net from D to X.” 
The function will often be written @ +> Xx, or {x,}. If E is a subset of X, the net 
is eventually in E if there is some a in D such that ao < @ implies that x, is in 
E. The net is frequently in F if for any a in D, there is a 6 in D witha < 6B 
such that xg is in £. It is important to observe that the negation of “the net is 
eventually in E” is that “the net is frequently in the complement of E-” 

The directedness of the set D plays an important role in the theory by allowing 
us to work simultaneously with finitely many conditions on a net. For example, 
if {xq} is eventually in E; and eventually in Fo, then it is eventually in EF, N E>. 
In fact, the given conditions say that there are members a and a2 of D such that 
Xq is in FE; for a; < @ and xq in E2 for a2 < a. The directedness implies that 
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a, <Q and a2 < ap for some ao in D. Then {x,} is in FE; N E> for ap < a. 
This kind of argument will be used often without mention of the details. 

If X is a topological space, a net {x,} in X converges to xo in X if {xq} is 
eventually in each neighborhood of xo. In this case we write x, — xo, and we 
say that xo is a limit of {x,}. Because of the availability of Examples 3 and 4 
above, it is an easy matter to characterize the terms “Hausdorff,” “limit point,” 
“closed set,’ and “continuous at a point” in terms of convergence of nets. 


Proposition 10.31. A topological space X is Hausdorff if and only if every 
convergent net in X has only one limit. 


PROOF. Suppose that X is Hausdorff and that x, — xo and x, — yo with 
xo 4 yo. Choose disjoint open sets U and V with xo in U and yo in V. By the 
assumed convergence, {x,} is in U eventually and is in V eventually. Then it is 
inU MV = © eventually, and we have a contradiction. 

Suppose that X is not Hausdorff. Find distinct points x9 and yo such that every 
pair of neighborhoods U of xo and V of yo has nonempty intersection. For any 
such pair (U, V), define xy,y to be some point in the intersection. Combining 
Examples 3 and 4 above, we see that (U, V) > xy,y is anet in X converging to 
both xq and yo. 


Proposition 10.32. If X is a topological space, then 


(a) for any subset A of X and limit point x of A, there exists a net in A — {xo} 
converging to Xo, 

(b) any convergent net {x,} in X with limit xo in X either has xo as a limit 
point of the image of the net or else is eventually constantly equal to xo. 


PROOF. For (a), the definition of limit point implies that for each neighborhood 
U of xo, the set UN(A—{xo}) isnonempty. If xy denotes a point in the intersection, 
then U +> xy is anet in A — {xo} converging to xo. 

For (b), suppose that x9 is not a limit point of the image of the net. Then there 
exists a neighborhood U of xo such that U — {xo} is disjoint from the image of 
the net. Since the convergence implies that the net is eventually in U, it must be 
true that x, = xo eventually. 


Corollary 10.33. If X is a topological space, then a subset F of X is closed if 
and only if every convergent net in F has its limit in F’. 


PROOF. Suppose that F is closed and that {x,} is a convergent net in F with 
limit x9. By Proposition 10.32b, either xo is in the image of the net or xo is a limit 
point of the image of the net. In the latter case, xo is a limit point of the larger set 
F.. Ineither case, xo is in F; thus the limit of any convergent net in F is in F. 
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Conversely suppose every convergent net in F has its limit in F’. If xo is a limit 
point of F’, then Proposition 10.32a produces a net in F — {xo} converging to xo. 
By assumption, the limit xo is in F. Therefore F contains all its limit points and 
is closed. 


Proposition 10.34. Let f : X — Y bea function between topological spaces. 
Then f is continuous at a point xo in X if and only if whenever {x,.} is aconvergent 
net in X with limit xo, then { f (x,)} is convergent in Y with limit f (x0). 


REMARKS. This result needs to be used with caution if Y is not known to be 
Hausdorff. For example, let X and Y both be the set {a, b}. Let the topology 
for X be discrete and the topology for Y be indiscrete, consisting only of @ 
and the whole space. Every function f : X — Y is continuous. Suppose that 
f(@) = f(b) =a. Take x9 = b and xq = b for all a. Then {f (x.)} converges 
to both a and b. Hence we cannot evaluate f (xo) as just any limit of { f (%2)}; we 
have to pick the right limit. 


PROOF. Suppose that f is continuous at xp and that {x,} is a convergent net in 
X with limit x9. Let V be any open neighborhood of f (xo). By continuity, there 
exists an open neighborhood U of xo such that f(U) C V. Since xg — xo, the 
members Xq of the net are eventually in U. Then f (xq) isin f(U) C V for the 
same a’s, hence eventually. Therefore { f (x_)} converges to f (xo). 

Conversely suppose that x, — xo always implies f(%1) > f (xo). We are to 
show that f is continuous. If V is an arbitrary open neighborhood of f (xo), we 
seek some open neighborhood of xo that maps into V under f. Assuming that 
there is no such neighborhood for some V, we can find, for each neighborhood 
U of xo, some xy in U such that f(xy) is not in V. Then xy — xo, but f(xy) 
does not have limit f (x9) because f(xy) is never in V. This is a contradiction, 
and we conclude that some U maps into V under f; thus f is continuous. 


Proposition 10.35. Let X = X ,.,Xs be the product of topological spaces 
X;, and let p; : X — Xs be the s" coordinate function. Then a net {xy} in 
X converges to some xo in X if and only if the net {p;(xq)} in Xs converges to 
Ps(xo) for each s in S. 


REMARK. This is the sense in which the product topology is the topology of 
pointwise convergence. In combination with Corollary 10.33, this proposition 
simplifies the problem of deciding when a subset of a product space is closed in 
the product topology. 


PROOF. If {x,} converges to x9, then Proposition 10.34 and the continuity of 
Ds together imply that {ps (xq)} converges to ps (xo). 
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Conversely suppose that {p,(xq)} converges to p, (Xo) for all s. Fix s. If Us is 
an open neighborhood of p; (xo) in X,, then {ps (xq)} is eventually in U,. Hence 
there is some qo such that p, (xq) is in U,; whenever a < a. For the same values 
of a, {xq} is in p,'(Us). Thus {xq} is eventually in p,'(Us). 

Any neighborhood N of xo in X contains some basic open neighborhood of 
the form U = p;,'Us,) AeA ps (Us): It follows from the result of the 
previous paragraph that {x,} is eventually in each p> '(U,), hence is eventually in 
the intersection U , and hence is eventually in N. Therefore {x.} converges to xo. 


One can express also the notion of compactness in terms of nets, the idea 
being that compactness of X is equivalent to the fact that every net in X has a 
convergent subnet, for an appropriate definition of “subnet.” The remainder of 
this section will deal with this question. Carrying out the details of this equiv- 
alence is harder than what we have done so far with nets. Actually, the main 
benefit of the equivalence is the resulting simplification to proofs of compactness, 
especially to the proof of the Tychonoff Product Theorem. Since we have already 
proved the Tychonoff Product Theorem without nets, the material in the remainder 
of this section will be used only in minor ways in the rest of the book.® 

Let D and E be directed sets. A function from E to D, written +> a, is 
cofinal ’ if for any 6 in D, there is a v in E such that 6 < a, whenever v < pw. 
If w+ a, is cofinal and if w +> x, is anet from D to X, then the composition 
[L+> Xq, is anet from E to X and is called a subnet of the net @ +> xq. 

The prototype of a subnet is a subsequence. In this case, D and E are both 
the set of positive integers, and the function from E to D is k +> nx. If the 
sequence is {a,}, then the subnet/subsequence is {a,,}. For a general subnet one 
might expect that it would suffice always to take E to be a subset of D and to 
let the function from E to D be inclusion. However, this definition of subnet is 
insufficient to prove the desired characterization of compactness in terms of nets 
and subnets. 

A net from a directed set D to a nonempty set X is called universal if for any 
subset A of X, the net is eventually in A or eventually in A°. It of course cannot 
be eventually in both, since otherwise it would eventually be in the intersection, 
namely the empty set. 


Proposition 10.36. Each net in a nonempty set X has a universal subnet. 


REMARK. The proof will use Zorn’s Lemma. Apart from this one use, the only 
other uses of the Axiom of Choice in the remainder of this section are transparent 
ones. 


Nets play a more significant role in the companion volume Advanced Real Analysis. 
7This definition is not the standard one given in Kelley’s General Topology, but it leads to the 
standard definition of “subnet.” 
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PROOF. Let D be a directed set, and let a +> x, be anet from D to X. Consider 
all families Cg of subsets of X that are closed under finite intersections and have 
the property, for each A in Cg, that the net is frequently in A. There exists such a 
family, the singleton family {X} being one. Partially order the set of such families 
by inclusion upward, saying that Cg < Cg when Cg C Cg. In any chain of Cg’s, 
let C,, be the union of the sets in the various members of the chain. Since closure 
under intersection depends only on two sets at a time and since the other property 
of a Cg depends only on one set at a time, C,, is again a family of this kind. By 
Zorn’s Lemma let C be a maximal such family. 

Let us prove for each subset A of X that either A or A‘ is in C. In fact, if for 
every B inC, the net is frequently in AN B, then CU {A} is a family containing C 
and satisfying the two defining properties of one of our families. By maximality, 
CU {A} =C. Hence A is in C. Assuming that A is not in C, we obtain a set B in 
C such that the net fails to be frequently in AM B. Then B is a member of C such 
that the net is eventually in (AM B)°. 

Similarly if we assume that A‘ is not in C, we obtain a set B’ in C such that 
the net is eventually in (A° M B’)°. If neither A nor A is in C, then the net is 
eventually in 


(AN B)SN (ASN BY’ = (ACU BS) N (AU BY”) 
= (A°N (AU B“)) U (BE N(AU B“)) 
= (ASN B”)U (BSN (AU B")) 
Cc BY UBS =(BNB’Y’, 


and it cannot be frequently in B M B’. This contradicts the fact that BM B’ is 
in C because C is closed under finite intersections. This completes the proof that 
either A or A‘ has to be in C. 

The members of C form a directed set under inclusion downward, i.e., with 
partial ordering A < B if A D> B. FormC x D as a directed set under the 
definition in Example 4 at the beginning of this section. We construct a subnet as 
follows. For each ordered pair (A, 8) inC x D, let a,,g) be an element of D with 
B < a a,g) and with x,,, ,, in A; this choice is possible since D is directed and the 
given net is frequently in A. The function (A, 8) +> a, _g) is cofinal because for 
any 6 € D, the domain value (A, 8) has B < ag ,) whenever (A, 6) < (B,y). 
Thus (A, 8) +> xa(4,p) iS a subnet. 

To complete the proof, we show that this subnet is universal. For any subset A 
of X, we have seen that either A or A‘ has to be in C. Without loss of generality, 
assume that A is in C. For any fixed §, the inequality (A, 8) < (B, y) implies 
that xq, 18 in the subset B of A, and hence the subnet is eventually in A. 


Proposition 10.37. The following three statements about a topological space 
X are equivalent: 
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(a) X is compact, 
(b) every universal net in X is convergent, 
(c) every net in X has a convergent subnet. 


PROOF. To prove that (a) implies (b), let {x,} be a universal net in X, and 
suppose that {x,} is not convergent. For each x in X, there is then an open 
neighborhood U, of x such that {x} is not eventually in U,. Since the net is 
universal, it is eventually in (U,)° for each x. The open sets U, cover X. By 
compactness, let {U,,,..., U;,} be a finite subcover. The net is eventually in 
each (Ux; )° and hence is eventually in their intersection. But their intersection is 
empty since X = Uj-1 Ux;. We have arrived at a contradiction, and thus {xq} 
must be convergent. 

Statement (b) implies statement (c) since every net has a universal subnet, by 
Proposition 10.36. 

To prove that (c) implies (a), suppose that X is noncompact. We shall produce 
a net with no convergent subnet. If?/is an open cover of X with no finite subcover, 
we shall use U/ to define a directed set. Let F be the set of all finite subcollections 
of members of U/. This is directed under inclusion upward: a < Bifa C f. For 
each a in F, the set X —J,., U is not empty since U/ has no finite subcover, and 
we let xy be an element of X — Uy, U. Then a +> xg is a net. Suppose that 
{xq} has a convergent subnet, with some xo as limit. For any neighborhood N of 
Xo, {Xa} is frequently in N. Since U/ is a covering, there is some U in U with xo 
in U. By construction, {xq} is not in U as soon as a has {U} < a. We conclude 
that no subnet of {x,} converges. 


Proposition 10.37 gives the statement about general topological spaces that 
extends the equivalence of the Bolzano—Weierstrass property and the Heine— 
Borel property of closed bounded subsets of Euclidean space. To illustrate the 
power of nets, we can now use them to give a second proof of the Tychonoff 
Product Theorem (Theorem 10.27). 


SECOND PROOF OF TYCHONOFF PRODUCT THEOREM. Let X = X ,.,Xs5 be 
given with each X, compact, let ps : X — Xz be the s" coordinate function, 
and let {x,} be a universal net in X. Fix s, and let As be any subset of Xs. 
Since the net is universal, it is eventually in p, (As) or in ( p;\(As))°. Since 
(p,!(As))* = po! ((As)°), the net {ps (Xq)} is eventually in A, or in (A,)°. Thus 
{Ps (Xq)} is a universal net in X;. By Proposition 10.37 and the compactness of 
Xs, {Ps(X%q)} converges to some member x, of X;. Now let s vary. Forming the 
member x of X with p,(x) = xs for all s and applying Proposition 10.35, we see 
that x, — x. By Proposition 10.37, X is compact. 
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6. Quotient Spaces 


If X is a topological space and ~ is an equivalence relation on X, then we saw 
in Section | that the set X/~ of equivalence classes inherits a natural topology 
known as the “quotient topology.” If g : X — X/~ is the quotient map, then 
a subset U of X/~ is defined to be open in the quotient topology if g~!(U) is 
open in X. The quotient topology is then the finest topology on X/~ that makes 
the quotient map continuous. 

Without some assumption that relates the equivalence relation to the topology 
of X, we cannot expect much from general quotient spaces. In this section 
we shall investigate situations in which the quotient space does have reasonable 
properties. Ultimately our interest will be in four situations, some of which are 
hinted at in Section 1: 


(i) the passage from a regular topological space to the quotient when the 
equivalence relation is that x ~ y if x is in {y}*! (Proposition 10.7), 

(ii) the passage from a compact Hausdorff space X to the quotient when the 
equivalence relation is closed as a subset of X x X (to be discussed in 
Problem 11 at the end of the chapter), 

(iii) the passage from a “topological vector space” or “topological group” to 
a coset space (to be discussed in the companion volume Advanced Real 
Analysis), 

(iv) the piecing together of a “manifold,” or a “vector bundle,” or a “cov- 
ering space” from local data (to be discussed in the companion volume 
Advanced Real Analysis). 


We begin with some general facts. The first is a kind of “universal mapping 
property” for all quotient spaces. Its corollary describes a situation in which we 
can recognize a given space as a quotient even if it was not constructed that way: 
we say that a function F : X — Y is open if F carries open sets to open sets. 


Proposition 10.38. 


(a) Let F : X — Y be acontinuous function between topological spaces, let 
~ be an equivalence relation on X, and let g : X — X/~ be the quotient map. 
Suppose that F has the property that F(x;) = F (x2) whenever x; ~ x2, so that 
there exists a well-defined function f : X¥/~ — Y such that F = f oq. Then 
f is continuous. 

(b) The quotient X/ ~ is characterized by the property in (a) in the following 
sense. Suppose that g’ : X —> Z is any continuous function of X onto a 
topological space Z such that 

(i) x1 ~ x2 implies g’(x1) = q'(x2), 
(ii) whenever F : X — Y is a continuous function such that x; ~ x2 
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implies F (x,) = F (x2), there exists a continuous function f’ : Z > Y 
with F = f’od’. 
Then Z is canonically homeomorphic to X/~. 

PROOF. In (a), we want to know that f~!(U) is open in X/~ whenever U is 
open in Y. By definition of the quotient topology, f—!(U) is open in X/~ if and 
only if g~'(f-1(U)) is open in X. This set is F-!(U), which is open since F is 
assumed continuous. 

In (b), suppose Z and q’ are such that g’ : X — Z has the stated properties. 
We apply the result of (a) with F taken to be gq : X — X/~, and the property of 
Z gives us a continuous function f’ : Z + X/~ such that g = f' oq’. Then 
we apply the result of (a) with F taken to be g’ : X — Z, and (a) shows that the 
function f : X/~ —> Z with q’ = f oq is continuous. Combining these two 
equations gives usq = f’o f oq andq’ = fo f’oq’. Thus f’o f is the identity 
on the image of g, and f o f’ is the identity on the image of g’. Since g is onto 
X/~ and q' is onto Z, f : X/~ —> Z is a homeomorphism. 


Corollary 10.39. Let F : X — Y be a continuous function from one 
topological space onto another, and define x; ~ x2 if F(x;) = F (x2). Let 
q: X — X/~ be the quotient map, and let f : X/~ — Y be the continuous 
map such that F = f oq. If F is open, then f is ahomeomorphism and hence 
Y can be regarded as a quotient of X. 


REMARK. The continuity of f is the conclusion of Proposition 10.38a. 


PROOF. The function f : X/~ — Y is continuous, one-one, and onto. To 
see that f is open and hence is a homeomorphism, let an open set U in X/~ be 
given. Then F(q~!(U)) is open because q is continuous and F is open. Since 
F(q7'U)) = f(q(q7'(U))) = f(U), we see that f(U) is open. Hence f is 
open. 


EXAMPLE. Let X = X rs s&s be a product of topological spaces, fix s in S, 
and let ps : X — Xs be the s" coordinate function. We shall show that Ds 18 
open, so that X, can be regarded as the quotient of X by the relation that x; ~ x2 
if ps (1) = py (X2) for all s’ # 5. If U is an open set in X and x is in U, then we 
can find a basic open set V, = Pa. (U,)N-- Ps,' Un) about x that is contained 
in U. Then p;(V,) equals U; if s = s;, and it equals X, if s is not equal to any 
s;. In either case, p,(V,) is open. Thus p,(U) contains a neighborhood of each 
of its points and must be an open set. So ps is open. 


A key desirable property of a quotient space is that it is Hausdorff. The 
Hausdorff property is what makes limits unique, after all, and it therefore paves 
the way to doing some analysis with the space. The next proposition gives a 
useful necessary condition and a useful sufficient condition. 
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Proposition 10.40. Let X be a topological space, let ~ be an equivalence 
relation on X, and let R be the subset {(x,, x2) | x) ~ x2} of X x X. If the 
quotient topology on X/ ~ is Hausdorff, then R is a closed subset of X x X. 
Conversely if R is aclosed subset of X x X and if the quotient mapg : X > X/~ 
is open, then X/~ is Hausdorff. 


PROOF. Suppose that X/~ is Hausdorff. If (x, y) is not in R, then q(x) and 
q(y) are distinct points in X/~. Find disjoint open sets U and V in X/~ such 
that g(x) is in U and q(y) is in V. Then g~!(U) and q7!(V) are open sets in 
X with the property that no member of g~!(U) is equivalent to any member of 
q_'(V). Thus g~!(U) x q7!(V) is an open neighborhood of (x, y) that does not 
meet R. Hence R is closed. 

Conversely if R is closed and (x, y) is not in R, then there exists a basic open 
set U x V of X x X containing (x, y) that does not meet R. The sets g(U) and 
q(V) are open in X/~ since q is open, they are disjoint since no member of U 
is equivalent to a member of V, and they are neighborhoods of q(x) and q(y), 
respectively. Thus X/~ is Hausdorff. 


A special case is the situation with a pseudometric space in which the equiv- 
alence relation is that x ~ y if x and y are at distance 0 from one another. A 
generalization of this relation was given in Proposition 10.7, which said that in 
a regular topological space the relation x ~ y if x is in {y}*! is an equivalence 
relation. The corollary to follow gives properties of the quotient space when this 
equivalence relation is used. 


Corollary 10.41. Let X be aregular topological space, let ~ be the equivalence 
relation defined by saying that x ~ y if x is in {y}‘', and let g : X > X/~ be 
the quotient map. Then 


(a) q is open, and every open set in X is the union of equivalence classes, 
(b) X/~ is regular and Hausdorff, 

(c) X normal implies X/~ normal, 

(d) X separable implies X/~ separable. 


PROOF. First we show that every open set is a union of equivalence classes. 
Suppose that x is in an open set U in X. Let x ~ y. If y were not in U, then y 
would be in the closed set U° and hence {y}*! would be contained in U°. Since 
x ~ y,x is in {y}‘', and we are led to the contradiction that x would be in U°, 
hence in U M US = ©. So U is a union of equivalence classes. Then it follows 
that g~'(q(U)) = U, and the set g(U) has the property that its inverse image is 
open in X. By definition of the quotient topology, g(U) is open. Therefore g is 
an open map. This proves (a). 


474 X. Topological Spaces 


To prove the Hausdorff property in (b), we shall apply Proposition 10.40. Since 
(a) shows that g is open, it is enough to show that the subset R = {(x, y) |x ~ y} 
of X x X is closed. If (x, y) is not in R, then x is not in {y}‘!. By regularity of 
X, choose disjoint open sets U and V in X such that x is in U and {y}“ C V. 
Since U and V are unions of equivalence classes and are disjoint, no member of 
U is equivalent to any member of V. Therefore (U x V) MR = ©, and every 
point of R° has an open neighborhood lying in R°. Hence R is closed. 

As a result of (a), the open sets in X are in one-one correspondence via q with 
the open sets in X/~, and the same thing is true for the closed sets. Under this 
correspondence disjoint sets correspond to disjoint sets. Then regularity in (b), 
as well as conclusions (c) and (d), follow immediately. 
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According to Proposition 10.31, a Hausdorff topological space has unique limits 
for convergent sequences and nets. Corollary 10.41 shows that regularity of a 
space makes it possible to pass to a natural quotient space that is regular and 
Hausdorff. The following theorem exhibits a special role for the condition that a 
space be normal. 


Theorem 10.42. (Urysohn’s Lemma). If E and F are disjoint closed sets in 
a normal topological space X, then there exists a continuous function f from X 
into [0, 1] thatisO on E andis1lonF. 


PROOF. Proposition 10.5c shows in a normal space that between a closed 
set and a larger open set we can always interpolate an open set and its closure. 
Starting from E C F°, we find an open set U1/2 with 


ECUip © U2)" € FS. 
Then we can find open sets U4 and U3/4 with 
ECUiyjs © Oia)" © Vip © Wir)" © U3/4 S (U3/4)" C F°. 


Proceeding inductively on, we obtain, for each diadic rational number r = m/2” 
with 0 < r < 1, an open set U, between E and F* such that r < s implies 
(U,)" © U,. Put U; = X. For each x in X, define f(x) to be the greatest 
lower bound of all r such that x is in U,. Then f is 0 on E, is 1 on F,, and has 
values in [0, 1]. To see that f is continuous, let x be given, let r and s be diadic 
rationals in (0, 1) withr < f(x) < s,and choose diadic rationals r’ and s’ with 
r<r’ < f(x) <s' <s. Uf f(x) =0, we omitr andr’; if f(x) = 1, we omit s 
and s’.) We are to produce an open neighborhood U of x with f(U) € (r,s). If 
U =U, — (U,)", then U is open withr’ < f(U) <s’. Thusr < f(U) <sas 
required. We conclude that f is continuous. 
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EXAMPLE. In Example 4 of Section 2, we produced a certain Hausdorff regular 
space X that is not normal, but we deferred the proof that X is not normal until we 
had Urysohn’s Lemma in hand. We can now give that missing proof. As a set, X 
is the closed upper half plane {Im z > 0} in C. A base for the topology in question 
consists of all open disks in X that do not meet the x axis, together with all open 
disks in X that are tangent to the x axis; the latter sets are to include the point of 
tangency. For a point p on the x axis, the open disks of rational radii with point of 
tangency p form a countable local base. Arguing by contradiction, suppose that 
X is normal. Any subset of the x axis in X is closed in X, and we take E to be the 
set of rationals on the axis and F to be the set of irrationals on the axis. Urysohn’s 
Lemma (Theorem 10.42) supplies a continuous function f : X — [0,1] such 
that f(E) = 0 and f(F) = 1. Define a sequence of functions f, : R > [0,1] 
by fr(x) = f (x, 1), the notation (x, y) indicating a point in the (x, y) plane. 
The functions f,, are continuous in the ordinary topology on R since the topology 
on X is the ordinary topology of the half plane as long as we stay away from the 
x axis. At any point (x, 0) of the x axis, the sets 


Uy = (x, 0) U BL: (x, 2)) 


form a local base at (x, 0), and ea 4) is in U,, for n > m. The continuity of f 
therefore yields lim, f (e 7) = f(x, 0). Inother words, lim, f, exists pointwise 
on R and equals the indicator function of the set of irrationals. The sequence { f,,} 
is therefore a sequence of continuous real-valued functions on R whose pointwise 
limit is everywhere discontinuous. However, Theorem 2.54 implies that the set 
of discontinuities of the limit function is of first category in R, and the Baire 
Category Theorem (Theorem 2.53) implies that R is not of first category in itself. 


Thus we have a contradiction, and we conclude that X cannot be normal. 


Corollary 10.43. If E and F are disjoint closed sets in a compact Hausdorff 
space X, then there exists a continuous function f : X — [0, 1] that is 0 on E 
and is 1 on F. 


PROOF. This follows by combining Proposition 10.16 and Theorem 10.42. 


Corollary 10.44. If K and F are disjoint closed sets in a locally compact 
Hausdorff space X and if K is compact, then there exists a continuous function 
f : X — [0, 1] that is 1 on K, is 0 on F, and has compact support. 


PROOF. Using Proposition 10.19, regard X as an open subset of the one-point 
compactification X*. Proposition 10.20 shows that the compact space X* is 
Hausdorff. Choose disjoint open sets U and V in X by Corollary 10.22 such 
that K C U and F C V. Choose L compact in X by Corollary 10.23 such 
that K C L°. Then M = LN (X — V) is compact in X by Proposition 10.14, 
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and K C L°NU C L°N(X —V)? C (LN (X — V))? = M®. Hence 
K and X* — M°® are disjoint compact sets in X*. Corollary 10.43 produces a 
continuous g : X* — [0, 1] such that g is 1 on K and is 0 on X* — M°. Since 
FCVC(X-L)UV=X-(LN(X—-V)) =X-MC X-—M? C X*-M?, 
the function f = g| y has the required properties. 


8. Metrization in the Separable Case 


A problem about topological spaces, now completely solved, is to characterize 
those topologies that arise from metric spaces. Such a space is said to be metriz- 
able. We consider only the separable case and prove the following theorem. 


Theorem 10.45 (Urysohn Metrization Theorem). Any separable regular Haus- 
dorff space X ishomeomorphic to a subspace of the Hilbert cubeC = XK ie [0, 1] 
and is therefore metrizable. 


PROOF. The Hilbert cube C is seen as a metric space in Example 11 in Section 
II.1, Corollary 10.29 identifies it as a product space, and the Tychonoff Product 
Theorem (Theorem 10.27) shows that it is compact. Let p, : X — [0, 1] be the 
n™ coordinate function. 

By Corollary 10.10, X is normal. Fix a countable base 6 for the open sets. 
Enumerate the countable set of pairs (U, V) of members of B such that U alcy, 
To the n" pair, associate by Urysohn’s Lemma (Theorem 10.42) a continuous 
function f, : X — [0, 1] such that f, is 1 on U“' andisOon V°. Let F : X > C 
be defined by “F (x) is the sequence whose n"™ term is f,(x).’ We are to show 
that F is continuous, is one-one, and is open as a function onto F(X). 

The continuity of p, o F = f, for each n means that F~' p>! of any open set 
in [0, 1] is open in C. Since F! of a basic open set in C is the finite intersection 
of the various F~! p> '’s of open sets, F is continuous. 

To see that F is one-one, let x and y be distinct points of x. By Proposition 
10.6c, X Hausdorff implies that {y} is closed and hence that {y}° is an open 
neighborhood of x. Choose a basic open set V containing x and contained in 
{y}°. By Proposition 10.5b and the regularity of X, choose a basic open set 
U containing x such that U*! C V. Then (U, V) is one of our pairs, and the 
corresponding function f, has f,(x) = 1 and f,(v) = 0. Hence F(x) 4 FQ), 
and F is one-one. 

To see that F carries open sets of X to open sets in F(X), let W be open in 
X, and fix x in W. Arguing as in the previous paragraph, we can find basic open 
sets U and V such that x is in U and US! C V C W. The corresponding f;, then 
has f,(x) = 1 and f,(V°) = 0. Hence f,(W°) = 0. The set NV, of y’s such that 
Jn(&) > Ois open in X and contains x. The product (0, 1]n x (X ken lO, 1k) is 
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open in C, and its intersection with F(X) is the same as F(N,) M F(X). Thus 
F(N,) ON F(X) is relatively open in F(X). Then F(x) lies in this relatively open 
set, which in turn lies in F(W), and it follows that F(W) is a relatively open 
neighborhood of each of its members. 


Corollary 10.46. Every separable compact Hausdorff space is metrizable. 


PROOF. This is immediate from Proposition 10.16 and Theorem 10.45. 


9. Ascoli-Arzela and Stone—Weierstrass Theorems 


In Section II.10 we studied Ascoli’s Theorem (Theorem 2.56) and the Stone— 
Weierstrass Theorem (Theorem 2.58) as tools for working with continuous func- 
tions on compact metric spaces. In turn, these theorems were illuminating 
generalizations of results about continuous functions on closed bounded intervals 
of the line, particularly the classical version of Ascoli’s Theorem (Theorem 1.22) 
and the Weierstrass Approximation Theorem (Theorem 1.52). In this section 
we shall extend these results to the setting of continuous functions on compact 
Hausdorff spaces. The proof of the extended Ascoli theorem will be our first 
example of how the Cantor diagonal process gets replaced by an application 
of the Tychonoff Product Theorem (Theorem 10.27) when one is dealing with 
an uncountable number of limiting situations at once. The Stone—Weierstrass 
Theorem in the more general setting becomes in part a tool for dealing with large 
abstract compact Hausdorff spaces that arise in functional analysis. The starting 
point for this investigation is the general form of Alaoglu’s Theorem,® which says 
that the closed unit ball in the dual X* of a normed linear space X is compact in 
the weak-star topology; closed subsets of this space play a foundational role in 
the theory of Banach algebras. 

We work in this section with a compact Hausdorff space X and with the algebra 
C(X) of bounded continuous scalar-valued functions on X. The scalars may be 
real or complex. Corollary 10.13 shows that if f is a continuous scalar-valued 
function on X, then | f| attains its maximum value on X. The set C(X) is a 
subspace of the normed linear space B(X) of bounded scalar-valued functions 
on X, the norm being || F sags = sup, cy | f(x)|. Convergence in B(X) is uniform 
convergence. Proposition 10.30 shows that C(X) is a closed subspace of B(X) 
and is complete as a metric space. 

We begin with the extended Ascoli theorem. Let F = {fy} be a set of 
scalar-valued functions on the compact Hausdorff space X. We say that F is 
equicontinuous at x in X if for each € > 0, there is an open neighborhood U, . 


8A preliminary form of this theorem was given as Theorem 5.58. The general form appears in 
the companion volume Advanced Real Analysis. 
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of x such that | fu(y) — fa(x)| < € for all y in U,.. and all f, in F. We say 
that F is equicontinuous if it is equicontinuous at each point. Not having a 
metric to compare different points of X , we no longer define a notion of “uniform 
equicontinuity.” 

It is immediate from the definition that any subset of an equicontinuous family 
is equicontinuous. The definition of equicontinuity at x reduces to the defini- 
tion of continuity if F has just one member, and therefore every member of an 
equicontinuous family is continuous. 

As in Section II.10 the set ¥ is uniformly bounded on X if it is pointwise 
bounded at each x € X and if the bound for the values | f (x)| with f € F can be 
taken independent of x. 


Lemma 10.47. If F = { f} is equicontinuous at x in X, then the closure F“! 
of F in the product topology on C* is equicontinuous at x. 


REMARK. Consequently every member of F“! is continuous at x. 


PROOF. Let U;.< be as in the definition of equicontinuity of F at x. For each 
€ > 0, the set of functions f € C* such that 


If) -f@| <« 


for a particular y in X is a closed subset of C*. Thus the set of functions f € C* 
such that this inequality holds for all y in U,,., being an intersection of closed sets, 
is closed, and it contains ¥. In turn, the intersection G of these sets taken over 
all « > 0 is closed in C* and contains F. For each € > 0, each g in this closure 
G satisfies the inequality |g(y) — g(x)| < 2e€ whenever y is in U,_-. Therefore 
G is equicontinuous at x, and so is its subset F“, 


Theorem 10.48 (Ascoli—Arzela Theorem). If { f,} is an equicontinuous family 
of scalar-valued functions defined on a compact Hausdorff space X and if {f,} 
has the property that { f,(x)} is bounded for each x, then { f,} has a uniformly 
convergent subsequence. 


PROOF. We may assume that there are infinitely many distinct functions f,, 
since otherwise the assertion is trivial. Let | f,(x)| < cy for all n, and form the 
product spaceC = X ee eC | lz| < Gets The space C is compact by the 
Tychonoff Product Theorem (Theorem 10.27), and we are now assuming that 
there are infinitely many members of the sequence { f,,} in the space. Let S be the 
image of the sequence as a subset of C. If S were to contain all its limit points, 
then each f,, would have an open neighborhood in C disjoint from the rest of S; 
these open sets and S° would form an open cover of C with no finite subcover, 
in contradiction to compactness of C. Thus S has a limit point f not in S. By 
Lemma 10.47 and the remarks before it, the family S U { f} is equicontinuous. 
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Let € > 0. We shall complete the proof by producing an fy in S such that 
| fu (x) — f(x)| < € for all x. By equicontinuity find an open neighborhood U, 
for each x such that y € U, implies 


lfn(y) — fa(x)| < €/3 for all n 
and If(y) — f@)| < €/3. 


The open sets U, cover X, and finitely many of them suffice to cover, by the 
compactness of X. Thus there are finitely many points x;,..., x, in X with the 
property that for each y in X, there is some x; with 1 < j <k such that 


Ifn(y) — frail <e/3 and | f(y) — fj) < €/3 


for all n. Since f is a limit point of S, choose N such that 
lfn(xj)) -— f@pl<e/3  forl<j<k. 


Then for every y in X, there is an x; such that 


lfvo) —fOI < fy) — fu pl t+ lfy ap) -— fapl+lfa) — fO)| < «. 


Thus fy is within distance € of f, as asserted. 


Corollary 10.49. If X is a compact Hausdorff space, then a subset F = { fy} 
of C(X) is compact if and only if 
(a) Fis closed in C(X), 
(b) the set { f} is pointwise bounded at each point in X, and 
(c) Fis equicontinuous. 


In this case, F is uniformly bounded. 


PROOF. Suppose that the three conditions hold. Being a subset of C(X), F is 
a metric space under the restriction of the metric. By Theorem 2.36, F will be 
compact if we prove that every sequence has a convergent subsequence. Because 
of (b) and (c), Theorem 10.48 shows that every sequence in F has a uniformly 
Cauchy subsequence. By (a) and the completeness of C(X) given in Proposition 
10.30, Fis complete as a metric space. Hence the Cauchy subsequence converges 
to an element of F. 

Conversely suppose that F is compact. Property (a) follows since compact 
sets are closed in any metric space. For (b) and the stronger conclusion that 
F is uniformly bounded, the function f +> |If'llsup is a continuous function 
on the compact set F, and Corollary 10.13 shows that it is bounded. For the 
equicontinuity in (c), let € > 0 and x be given. Theorem 2.46 shows that F 
is totally bounded as a metric space. Hence we can find a finite set fi,..., fi 
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in F such that each member f of F has supy.s | f(y) — f7()| < € for some 
j. By continuity of each f;, choose an open neighborhood U,. of x such that 
|fi(x) — fiQy)| < € for 1 < i </ forall yin U,.. If f is some member of F 
and if f; is the member of the finite set associated with f, then y € U;,< implies 


If) — FO) S$ 1FO) — OIF IFO) — FOI + 1G) — F@)| < 3e. 


Hence F is equicontinuous at each x in X. 


Now we come to the extended Stone—Weierstrass Theorem. We are interested 
in showing that certain subalgebras of the algebra C(X) of continuous scalar- 
valued functions on a compact Hausdorff space X are dense in C(X). Except for 
the dropping of the assumption that X is metric, the assumptions and notation 
are the same as in Section II.10. In particular the scalars for the subalgebra and 
for C(X) may be real or complex, and the statement of the theorem is slightly 
different in the two cases. 


Theorem 10.50 (Stone—Weierstrass Theorem). Let X be a compact Hausdorff 
space. 

(a) If A is a real subalgebra of real-valued members of C (X) that separates 
points and contains the constant functions, then A is dense in the algebra 
of real-valued members of C(X) in the uniform metric. 

(b) If A is a complex subalgebra of C(X) that separates points, contains the 
constant functions, and is closed under complex conjugation, then A is 
dense in C(X) in the uniform metric. 


REMARKS. Curiously, Urysohn’s Lemma (Corollary 10.43) does not play a 
role in the proof. Instead, the role of Urysohn’s Lemma is to ensure that C (X) 
is large in applications, and then the present theorem has serious content. The 
actual proof of Theorem 10.50 is word-for-word the same as for Theorem 2.58, 
and there is no need to repeat it. 


10. Problems 


1. Let f and g be continuous functions from a topological space into a Hausdorff 
space Y. 

(a) Prove that the set of all points x in X for which f (x) = g(x) is closed. 
(b) Prove that if f(x) = g(x) for all x in a dense subset of X, then f = g. 

2. (Dini’s Theorem) Let X be a compact Hausdorff space. Suppose that the 
function f, : X — R is continuous, that fj < fo < fs < ---, and that 
f(x) = lim f,,(x) is continuous and is nowhere +00. Use the defining property 
of compactness to prove that { f,} converges to f uniformly on X. 
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(Baire Category Theorem) Prove that a locally compact Hausdorff space cannot 
be the countable union of closed nowhere dense sets. 

Prove that a locally compact dense subset of a Hausdorff space is open. 

This problem produces a locally compact Hausdorff space that is not normal. 
Verify the details of the construction. Let X be a countably infinite discrete 
space, and let Y be an uncountable discrete space. Let X* and Y* be their 
one-point compactifications, with the added points denoted by xo, and yoo. The 
locally compact Hausdorff space is Z = X* x Y* — {(Xo0, Yoo)} with the relative 
topology. Two closed subsets that cannot be separated by disjoint open sets are 
A= ({Xoo} x Y*) — {(Xo01 Yoo)} and B= (X™ x {Yoo}) — {@loos Yoo) }. 

If X is compact, prove that each infinite subset of X has a limit point. 


Let U be the family of subsets of R consisting of all sets {x € R| x < a}, 

together with @ and R. 

(a) Prove that U/ is a topology for R and that it is not Hausdorff. (It is called the 
upper topology of R.) 

(b) If {tn}nep is a net in R, define limsup, t, to be the infimum over n of 
SUPmeD, m>n- Prove that a net {t,}nep in IR converges to ¢ relative to U/ if and 
only if lim sup, t, >t. 

Let (X, T) be a topological space, and let 7/ be the upper topology of R as in the 

previous problem. A function f : X — R is said to be upper semicontinuous 

if it is continuous with respect to T and U. 

(a) Prove that upper semicontinuity of f : X — Ris equivalent to the condition 
that lim sup f (x,) < f(x) whenever x, —> x in X. 

(b) Prove that the function f : R — R that is 1 at x = 0 and is 0 elsewhere is 
upper semicontinuous. 

(c) Prove that if f and g are upper semicontinuous functions on X and if c is 
nonnegative real, then f + g and cf are upper semicontinuous. 

(d 


wm 


Prove that if {f;}ses is a nonempty set of upper semicontinuous functions 

on X such that infs<s f(x) > —oo for all x € X, then infs<es f, is upper 

semicontinuous. 

(e) Prove that if f is a bounded real-valued function on X, then there exists a 
unique smallest upper semicontinuous function f~ with f~ (x) > f(x) for 
all x. 

Let (X, TZ) be a topological space. A function f : X — R is lower semi- 

continuous if —f is upper semicontinuous. In this case if f is bounded, let 

f- = —(—f)~, with the right side defined as in the previous problem. Let the 

oscillation QO; of f be defined by O¢(x) = f(x) — f_(x) for x in X. 

(a) Why is Q; upper semicontinuous? 

(b) Prove that this definition agrees with the one in Section II.9. 

(c) Prove that f is continuous if and only if Q; is identically 0. 
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10. Let X be a Hausdorff topological space in which there are two disjoint nonempty 
closed sets A and B. Let ~ be the equivalence relation that identifies all elements 
of A with each other, identifies all elements of B with each other, and otherwise 
identifies no distinct points of X. 


(a) 
(b) 


Prove that the subset of pairs (x, y) in X x X with x ~ y is closed. 
Give an example of this kind in which X/~ is not Hausdorff. 


11. Let X be a compact Hausdorff space, and let ~ be an equivalence relation on 
X such that the subset R C X x X of pairs (x, y) with x ~ y is closed. Let 
q:X — X/~ be the quotient map. 


(a) 
(b) 


(c) 
(d) 
(e) 
(f) 


Prove for each x € X that q~!q(x) is a closed subset of X. 

If U C X is open, prove that V = {x € X | q q(x) Cc U} is open by 
first proving that V° = po((US x X) 1 R), where po : X x X — X is the 
projection to the second coordinate. 

Prove that the compact quotient X/~ is Hausdorff. 

Prove that the quotient map is closed, i.e., that closed sets map to closed sets. 
Is the quotient map necessarily open? 

As in one of the examples in Section 1, let X be the interval [—z, wz], and 
let S! be the unit circle in C. Let ~ be the equivalence relation that lets — 
and zr be the only nontrivial pair of elements of X that are equivalent, and 
form X/~. Prove that X/ ~~ is homeomorphic to S! and that under this 
identification the quotient map may be taken to be the function p : X > S! 
given by p(x) =e. 


Problems 12-15 concern connectedness and connected components. Most of the 
definitions and proofs in the first three are rather similar to those in Chapter IT ($11.8 
and Problems | 1—13) for the special case of metric spaces. A topological space X is 
connected if X cannot be written as X = U U V with U and V open, disjoint, and 
nonempty. A subset EF of X is connected if E is connected as a subspace of X, ie., 
if E cannot be written as a disjoint union (E 1 U) U (EM V) with U and V open in 
X and with EMU and E' V both nonempty. 


12. (a) 


(b) 


13. (a) 


(b) 


14. (a) 


Prove that a continuous function between topological spaces carries con- 
nected sets to connected sets. 

A path in a topological space X is a continuous function from a closed 
bounded interval [a,b] into X. Why is the image of a path necessarily 
connected? 

If X is a topological space and {E,} is a system of connected subsets of X 
with a point xo in common, prove that Gr E, is connected. 

If X is a topological space and F is a connected subset of X, prove that the 
closure E“! is connected. 

A topological space X is pathwise connected if for any two points x; and 
x2 in X, there is some continuous p : [a,b] — X with p(a) = x; and 
p(b) = x2. Why is a pathwise-connected space X necessarily connected? 
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(b) A topological space X is called locally pathwise connected if each point 
has arbitrarily small open neighborhoods that are pathwise connected. Prove 
that if X is connected and locally pathwise connected, then it is pathwise 
connected. 


15. In a topological space X, define two points to be equivalent if they lie in a 
connected subset of X. 
(a) Show that this notion of equivalence is indeed an equivalence relation. The 
equivalence classes are called the connected components of X. 
(b) Prove that the connected components of X are closed sets. 
(c) Prove that the connected components of X are open sets if X is locally 
connected, i.e., if each point has arbitrarily small connected neighborhoods. 


Problems 16-17 concern partitions of unity, which were introduced in Section III.5. 
An open cover U of a topological space is said to be locally finite if each point of x 
has a neighborhood that lies in only finitely many members of U/. 

16. Suppose that 2/ is a locally finite open cover of a normal space X. By applying 
Zorn’s Lemma to the class of all functions F defined on subfamilies of U/ such 
that F(U), for each U in the domain of F,, is an open set with F(U)" CU and 

(os, FRO wo, I” Vex, 
Uedomain(F) vel, 
V ¢domain(F) 


prove that it is possible to select, for each U in U, an open set Vy such that 
vo C U and such that {Vy | U € U/} is an open cover of X. 


17. Prove that if /is a locally finite open cover of a normal space X, then it is possible 
to select, for each U in U/, a continuous function fy : X — [0,1] such that fy 
is 0 outside U and such that pe fu(x) = 1 forallx € X. 


Problems 18—20 establish the Tietze Extension Theorem. Let X be a normal topolog- 
ical space, and let C be a closed subset of X. Suppose that f is a bounded real-valued 
continuous function defined on C. The theorem is that there exists a continuous 
function F : X — R such that Fl. = f and sup, cy |F(x)| = sup,ec If @)I. 


18. Let g09 = f, co = SUP,ec |go(*)|, Po = {x € C | go(x) = co/3}, and No = 
{x € C | go(x) < —co/3}. Show that there is a continuous function Fo from X 
into [—co/3, co/3] that is co/3 on Po and —co/3 on No. 


19. In the previous problem, put gj = go — Fo on C, and let cy = supyec |gi(x)|. 
Show that cy < $c0. When the result of the previous problem is applied to gj in 
order to produce a function F',, what properties does F have? 


20. Show that iteration of the above results produces a sequence of continuous 
functions F,, : X — R such that the series )°°° ¢. F(x) is uniformly convergent 
on X and such that the sum F(x) = bear, F,,(x) is continuous. Show also that 
F has Fl. = f and satisfies sup,.y |F(x)| = supyec | f (x)I. 
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Problems 21-28 concern order topologies. Suppose that X is a set with at least two 
elements and having a total ordering, i.e., a partial ordering < such that 

(i) x < yand y < x together imply x = y, 

(ii) any x and y in the set have either x < y ory <x. 
Define x < yto mean thatx < yandx # y. The order topology on X is the topology 
for which a base consists of all sets {x | x < b}, {x | a < x}, and {x |a < x < D}. 
For a nonempty subset Y of X, the terms “lower bound,’ “upper bound,” “greatest 
lower bound,” and “least upper bound” are defined in the expected way. Examples 
are given by the real line R with its usual topology, the set Q of countable ordinals 
(as defined in Problems 25-33 at the end of Chapter V) with its order topology, and 
other examples given below. 
21. Prove that every open interval {x | a < x < b} in X is open and every closed 

interval {x |a < x < b} is closed. 


22. Prove that X is Hausdorff and regular in its order topology. 


23. Prove that every nonempty subset with an upper bound has a least upper bound if 
and only if every every nonempty subset with a lower bound has a greatest lower 
bound. In this case, X is said to be order complete. 


24. Suppose that X is order complete. 
(a) Prove that a nonempty subset Y of X is compact if and only if Y is closed 
and has a lower bound and an upper bound. 
(b) Prove that X is locally compact. 


25. (a) Prove that if there exist a and b in X witha < b and with noc such that 
a <c <b,then X is not connected, in the sense of Problems 12—15. Let us 
say that X has a gap when such a and b exist. 

(b) Prove that if X is order complete and has no gaps, then X is connected. 


26. The set X = [0, 1) U [2, 3) is totally ordered. Prove that this X is connected 
in its order topology, and conclude that the order topology is different from the 
relative topology for X as a subspace of R. 


27. The set X = [0, 1) UCL, 2] is totally ordered. Prove that this X is not connected 
in its order topology but has no gaps. 


28. Let X and Y be two totally ordered sets with at least two elements apiece. 

Define the lexicographic ordering on X x Y to be the total ordering given by 

(X1, yi) S 2, y2) ifx1 < x2 or else x; = x2 and y; < yp. 

(a) Prove that the lexicographic ordering on [0,1] x [0, 1] makes the space 
compact connected but not separable. 

(b) The long line is defined to be the product Q x [0, 1) with the lexicographic 
ordering, where Q is the set of countable ordinals as defined in Problems 
25-33 at the end of Chapter V. Prove that the long line is locally compact 
and connected but not separable. 


CHAPTER XI 


Integration on Locally Compact Spaces 


Abstract. This chapter deals with the special features of measure theory when the setting is a 
locally compact Hausdorff space and when the measurable sets are the Borel sets, those generated 
by the compact sets. 

Sections 1-2 establish the basic theorem, the Riesz Representation Theorem, which says that any 
positive linear functional on the space Ceom(X) of continuous scalar-valued functions of compact 
support on the underlying space X is given by integration with respect to a unique Borel measure 
having a property called regularity. The steps in the construction of the measure run completely 
parallel to those for Lebesgue measure if one regards the geometric information about lengths of 
intervals as being encoded in the Riemann integral. The Extension Theorem of Chapter V is the 
main technical tool. 

Section 3 studies more closely the nature of regularity of Borel measures. One direct general- 
ization of a Euclidean theorem is that the space of continuous functions of compact support in an 
open set is dense in every L? space on that open set for 1 < p < oo. A new result is the Helly—Bray 
Theorem—that any sequence of Borel measures of bounded total measure in a locally compact 
separable metric space has a weak-star convergent subsequence whose limit is a Borel measure. 

Section 4 regards Ceom(X) as a normed linear space under the supremum norm and identifies the 
space of continuous linear functionals, with its norm, as a space of signed or complex Borel measures 
with a regularity property, the norm being the total-variation norm for the signed or complex Borel 
measure. 


1. Setting 


This chapter brings together the measure theory of Chapters V—VI and the theory 
of topological spaces of Chapter X in a way that takes many of our earlier most 
interesting examples into account. Specifically we shall study the special features 
of measure theory when the underlying space is a locally compact Hausdorff 
space. Our primary example from earlier is that of Lebesgue measure, first on 
R! and then in R¥. In R! we considered also the class of all Stieltjes measures 
and showed how they are classified by monotone functions satisfying certain 
properties. We introduced Borel measures in R% but did not attempt to classify 
them. 

Along the way we saw glimpses of some other examples: The unit circle of C 
can be regarded as [—z, zr] if we identify —z and z, and we obtained Lebesgue 
measure on the circle. As we saw, any open set or any compact set in R% has 
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a theory of Borel measures associated with it. Most of our concrete examples 
of such measures when N > 1 came about as a consequence of the change- 
of-variables formula for multiple integrals. Of particular interest is what we 
anticipated in Section VI.5 would ultimately come to be regarded as a “rotation- 
invariant measure on the sphere,” the sphere S—! being a compact metric space. 
This measure corresponds to the expression dw when Lebesgue measure dx on 
IR" is written in spherical coordinates and the factor r’—!' dr is dropped. In the 
concrete case of R?, in which r is the radius, 6, is the latitude from the north pole, 
and 6, is the longitude, Lebesgue measure is given by dx = r2 sin 6, dO, dO, dr 
and we have dw = sin 6, d@,d6,. The change-of-variables formula in the N- 
variable case then reads 


(ee) 


f)dx = i / fro) rN! dw dr 
RY r=0 JweSN-! 


for every Borel measurable function f > 0 on R™. We shall be making sense of 
dw as a genuine measure on S‘~! in the course of the present chapter. 

In the opposite direction it is important not to get the idea that all important 
measure-theoretic examples in mathematics arise from locally compact Hausdorff 
spaces. Examples that arise from probability theory need not fit this pattern. This 
fact becomes clearer after one encounters some specific measure spaces that arise 
in the theory.! 

Let us turn to the setting of this chapter, a locally compact Hausdorff space X. 
In order that the measure theory have some connection with the topological-space 
structure, we shall build our o-algebra out of topologically significant sets. There 
will be a choice for how to do so, and we come to that point in a moment. 

We shall follow as much as possible the pattern of the development of Lebesgue 
measure on an interval of R! or on all of R!, as occurred in Chapter V, in order 
to construct measures on X. The thing that is missing for general X occurs right 
at the start: it is the kind of geometric information that goes into regarding the 
length of an interval as a quantity worthy of study. That is where an ingenious idea 
comes into play, that of studying linear functionals on the vector space Cgom(X) 
of continuous scalar-valued functions on X that vanish off a compact subset of X. 
As in earlier chapters, it will not be important whether the scalars for Coom(X) 
are real or complex, and the reader may fix attention on either of these. 

On an interval [a, b], we thus consider the space C ({a, b]) of scalar-valued 
continuous functions on the interval. The particular linear functional of interest 
is the Riemann integral €(f) = Rk 4 f (x) dx, the notation with the R being as 
in Section V1.4. This kind of integral is a fairly simple object analytically; it was 


'The measure-theoretic foundations of probability theory are discussed in the companion volume 
Advanced Real Analysis. 
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quickly shown to make sense in Theorem 1.26. Our point of view will be that the 
Riemann integral encodes information about the lengths of all intervals. 

Why might one consider linear functionals? In the subject of linear algebra, 
linear functionals play an important role. Two important ways of realizing subsets 
of Euclidean space are parametric form and implicit form. In the case of a vector 
subspace of IR”, the idea of parametric form leads us to represent the subspace 
as all linear combinations of members of a spanning set. If we use implicit form 
instead, the subspace is realized as all vectors satisfying a set of homogeneous 
linear equations, thus as the kernel of some linear function. The most primitive 
case of the latter is that there is just one nontrivial equation. Then the linear 
function has range the scalars, and the linear function is a linear functional. When 
there are several equations, the subspace is in effect described as the intersection 
of the kernels of several linear functionals. 

Thus linear functionals in linear algebra arise in describing vector subspaces, 
specifically in describing subspaces by limiting their size from the outside. In 
analysis we have occasionally needed this kind of control of a subspace in proving 
theorems by an approximation argument. Two nontrivial examples were the 
proofs in Chapter VI of differentiation of integrals and the proof in Chapter IX of 
the boundedness of the Hilbert transform. In each case we proved a theorem for 
“nice” functions, and we obtained some estimate for all functions of interest. To 
connect the one conclusion with the other, we needed to know that the subspace 
of “nice” functions is dense. Corollary 6.4 was a result of this kind, saying that 
Ceom(R) is dense in L'(R”) and in L*(IR). The proof given for Corollary 6.4 
was more like an argument using spanning sets, showing that we can pass from 
Ceom(R) to simple functions and then recalling that simple functions are dense 
as a consequence of basic properties of the Lebesgue integral. 


However, we can visualize another argument of this kind, one with continuous 
linear functionals. If one could prove, for any proper closed vector subspace of 
our total space of functions (L! or L? or something else), that there is a nonzero 
continuous linear functional on the total space vanishing on the closed subspace, 
then we could test whether a given vector subspace is dense by examining the 
effect of continuous linear functionals when restricted to the subspace. Histor- 
ically this idea began to be applied in analysis in the early part of the twentieth 
century at about the same time that people began thinking frequently about spaces 
of functions and not just individual functions. The key general existence tool for 
such continuous linear functionals was the Hahn—Banach Theorem, which we 
shall take up in Chapter XII. 

In any event, out of this confluence of ideas arose the idea of considering 
continuous linear functionals on Ccom(X ) as capturing enough information about 
X to make measure theory possible. The continuity of a linear functional will 
actually be somewhat concealed in what we do for most of this chapter, and 
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instead we impose on the linear functional the natural condition that it needs to 
satisfy in order to provide a notion of integration—that it be > O on functions 
> 0. 

Let us be more precise about the definitions. Let X be a locally compact 
Hausdorff space, and let Ccom(X) be the vector space of scalar-valued functions 
on X that vanish outside some compact set. For a specific function f , the support 
of f is the closure of the set where f isnot zero. The members of Coom(X) are then 
the continuous scalar-valued functions on X having compact support. A linear 
functional £ on Ceom(X) is said to be positive if 2(f) => O whenever f > 0. 
The Riesz Representation Theorem, to be stated formally in Section 2 with all 
details in place, will say that to any such £ corresponds a measure jz on a certain 
o-algebra of “topologically significant” sets such that 


L(f) a) fdp for all f € Coom(X). 


The “topologically significant” sets have to include the sets necessary to make 
each f in Coom(X) measurable. At first glance it might seem that the smallest 
o-algebra containing the open sets is the right object. But in fact this o-algebra 
is unnecessarily large. In an uncountable discrete space, we do not need to have 
every subset measurable in order to have all the functions of compact support be 
measurable. Accordingly we define the o-algebra B(X) of Borel sets of X to be 
the smallest o-algebra containing all compact subsets of X. 

The plan of attack now follows the steps in the construction of Lebesgue 
measure. We take the compact subsets of X to be the analog of the bounded 
intervals in R!, and we thus define the “elementary sets” in X to be the sets in the 
smallest ring K(X) containing all the compact sets. In the case of R!, every set 
in the ring generated by the bounded intervals is a finite disjoint union of sets that 
are the difference of two bounded intervals. We shall prove for X in Section 2 
that every member of K(X) is a finite disjoint union of sets that are the difference 
of two compact sets. 

For R!, we defined the measure of the difference of two bounded intervals to 
be the difference of their lengths as soon as the second interval is contained in 
the first; this was no loss of generality because the intersection of two bounded 
intervals is a bounded interval. The measure of a finite disjoint union was defined 
as the sum of the measures. We showed that this was well defined, and then we 
had a finite-valued nonnegative additive set function on a ring of sets. 

For X, we define the measure of a compact set K by the natural formula 


MW(K)= inf &(f), 


J €Ccom 


O<f<Ik 


where /x as usual is the indicator function of K . The intersection of two compact 
sets is compact, and thus we can define the measure of K; — Kz for K; and K2 
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compact, to be w(K) — 4(K, 1 K2). We define the measure of the disjoint union 
of such sets K; — K> to be the sum of the measures. We have to prove that this is 
well defined, and then we have a finite-valued nonnegative additive set function 
won the ring K(X). 

The next step for R! was to prove complete additivity on the ring generated 
by the bounded intervals. With X, the problem is the same; we are to prove 
complete additivity on the ring K(X). Suppose that this has been done. Since 
pt is everywhere finite-valued on K(X), we can apply the Extension Theorem 
(Theorem 5.5) to extend jz to the generated o-ring. Either this o-ring is already 
the generated o-algebra 5(X), or Proposition 5.37 supplies a canonical extension 
to a measure on the generated o-algebra 6(X). This completes the construction 
of the measure jz on B(X). It is then a fairly easy matter to see that €(f) is 
recovered as the integral of f if f is in Coom(X): In the case of R!, we carried 
out this step by first establishing the Fundamental Theorem of Calculus for the 
Lebesgue integral of a continuous function; the argument appears at the end of 
Section V.3. A more direct argument would have been possible, and that direct 
argument works for general X. 

Thus the problem comes down to proving that the set function, as defined on 
the ring of sets, is actually completely additive on that ring. In the case of R!, that 
complete additivity was an easy consequence of “regularity” of Lebesgue measure 
on the ring generated by the bounded intervals; in other words, the measure of 
any set in the ring could be approximated from within by the measure of compact 
sets in the ring and from without by the measure of open sets in the ring. Exactly 
the same approach works for general X,, but the regularity has to be established. 

Quantitatively the construction of the measure comes down to defining w(K ) 
for K compact as above and then proving three identities: 


(i) M(K1) + u(K2) = w(K, U Ka) + w(K, K2) if K, and K are compact, 
(ii) sup €(f) = w(K) — w(K — U) if U is any open set contained in 


S€Ccom(X), 
O<fslu 
some compact set K’, 
(iii) sup pw(K)= sup ¢(f) if U is open and has compact closure. 
KCU, FECcom(X), 
K compact 0< f<lu 


Identity (i) and an elementary but lengthy computation in elementary set theory 
together allow us to prove that jz is well defined on the ring K(X) under the 
definitions above. Once yz has been so extended, the right side of (ii) is just w(U) 
if U is open with compact closure. Thus (iii) says that w(U) is the supremum of 
j4(K ) over compact sets K contained in U, provided U is open and has compact 
closure. Since (U) is trivially the infimum of (V) for open sets V in K(X) 
containing U, this is the regularity conclusion for U. It is easy to see that the 
subclass of K(X) for which regularity holds is a ring and contains the compact 
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sets, and hence regularity is established for K(X). 

When the locally compact Hausdorff space X is a metric space, the three 
identities above are fairly easy to prove. When X is metric, any indicator function 
Ix for K compactis the pointwise decreasing limit of members of Ccom(X) that are 
> 0. In fact,if D(- , K) is the distance to K , then the sequence { f,} with fn(x) = 
max{0, 1—n D(x, K)} has the required properties. A little trick proves in this case 
that w(K) = lim, &( fn). To prove (i), we choose such sequences {f,} and {gn} 
for K, and K2. If gisamember of Cgom(X) that is identically 1 on the union of the 
supports of f and gi, then f, + 8 = min{ fn + 8n, 9}+ (max{ fy, + 8n, P}— 7) 
decomposes f, + g, into the sum of such sequences for K; U Ky and K, 1 Kp, 
and identity (i) follows from linearity of @ and a passage to the limit. Identities 
(ii) and (iii) follow from equally simple arguments. 

The difficulty for a general locally compact Hausdorff space X is that the 
indicator function of a compact set need not be a pointwise decreasing limit of a 
sequence of continuous functions. The technicalities introduced by this fact have 
the effect of making the proofs of (i), (ii), and (iii) be more complicated, but these 
complications need not obscure the line of argument that is so clear in the metric 
case. 


2. Riesz Representation Theorem 


Throughout this section we fix the locally compact Hausdorff space X. We 
continue to let Ccom(X) be the space of continuous functions of compact support, 
K(X) be the ring of elementary sets, and B(X) be the o-algebra of Borel sets. 

A subset E of X is said to be bounded if it is contained in a compact set, 
hence if E“ is compact; it is c-bounded if it is contained in the countable union 
of compact sets. The class of all o-bounded Borel sets is a o-ring containing 
K(X), and it is therefore the smallest o-ring containing K(X). 

A measure on the Borel sets of X is called a Borel measure if it is finite on 
every compact set. A Borel measure jz is said to be regular if it satisfies 


W(E)= sup w(K) for every set E in B(X) 
eet 
WE) = Jue w(U) for every o-bounded set F in B(X). 
=) 


U open o-bounded 


Theorem 11.1 (Riesz Representation Theorem). If ¢ is a positive linear func- 
tional on Coom(X), then there exists a unique regular Borel measure jz on X such 
that 


L(f) =F fdu for all f € Coom(X). 
x 
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EXAMPLES. 

(1) If X is the line R! and @ is given by Riemann integration /(f) = 
R ie f (x) dx whenever [a, b] contains the support of f, then @ is a positive 
linear functional on Cyom(R!) and the corresponding ju is Lebesgue measure. 

(2) If X = S? is the unit sphere in R*, parametrized by latitude 6; from 0 to z 
and by longitude 62 from 0 to 277, then €(f) = R de ee Ff (01, 02) sin 0; dO. dO, 
is a positive linear functional on C (S*), and the corresponding measure, which is 
written dw in the same way that Lebesgue measure is written as dx, is a rotation- 
invariant measure on the sphere such that f,; F(x) dx = ihe for (rw)r? dwdr 
for every nonnegative Borel function on R%. The proof of this identity and of the 
rotation invariance will be indicated in Problem 5 at the end of the chapter. 

(3) If X is general and if w is a regular Borel measure on X, then £(f) = 
f y f du is a positive linear functional on Ceom(X). 


The proof of Theorem 11.1 will occupy the remainder of this section. We begin 
with some lemmas clarifying the nature of the ring K(X), the linear functional 
£, and general compact and open subsets of X. Then we recall the definition of 
(4(K) for compact sets and establish the identities (i), (ii), and (iii) in Section 1. 
Finally we give the details of how the three identities imply the theorem. 

We begin with information about the ring K(X). 


Lemma 11.2. The members of the ring K(X) are exactly all finite disjoint 
unions of subsets of V of the form K —L with K and LcompactandlL C K CV. 
The ring K(X) may be characterized also as the smallest ring containing all 
bounded open subsets of X. 


ProoF. If K; — L, and Ky — L> are two sets of the same kind as V in the 
statement of the lemma, then the identity 


(K, — Li) U (Ko — L2) 
= ((K, UK) — (LZ) UL) U (K2NL1) — Li L2)) U (KN £2) — (L191 L2)) 
shows that a union of two such sets is a disjoint union, and the identity 
(K, — Ly) — (Kz — La) = (KN L2) — (£1 £2)) U (Ky — (21 U(K19 K))) 
shows that the difference of two such sets is sucha set. Therefore the collection of 
all such sets is a ring of subsets of X. This ring contains all compact sets because 
any compact set K is of the form K — ©, and hence this ring equals K(X). 

Any open bounded set U is the difference of the compact sets U*! and U!—U, 
and hence it lies in K(X). In the reverse direction Corollary 10.23 shows that any 
compact set K is contained in the interior L° of some compact set L. Thus K is 


the difference of the bounded open sets L° and L° — K, and K(X) is contained 
in the smallest ring containing all bounded open sets. 
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Next we observe some properties of the linear functional £. It is to be under- 
stood throughout the section that £ is a positive linear functional on Cgom(X). The 
positivity implies that €(f — g) > Oif f — g > O; the linearity therefore gives 
L(f) => €(g) for f > g. The linear functional has a kind of continuity property, 
according to the following lemma. 


Lemma 11.3. Let K be a compact set, and let { f,} be a sequence in Ccom(X) 
converging uniformly toamember f of Ccom(X) in such a way that support( fi) C 
K for all n. Then lim, €(f;,) exists and equals €(f). 


PROOF. Corollaries 10.23 and 10.44 show that there exists a function F in 
Ceom(X) such that F takes values in [0, 1] andis 1 on K. Since f, — f < |f,—f| 
and —(fn — f) < |fn — fl, we have 


le(fn) — CA) = fn — AI S OU fn — FI) S lla) = nl (F), 


where c, = ||f, — f laa The assumed uniform convergence means that c, 
tends to 0. Since €(F’) is some fixed constant, the asserted convergence of ¢(f;) 
follows. 


Lemma 11.4 (Dini’s Theorem). If { f,} is a sequence of functions in Ccom(X) 
decreasing pointwise to 0, then { f,} converges uniformly to 0. 


PROOF. Because of the pointwise decrease to 0, all the functions f, have 
support contained in the compact set K = support(f)). Let € > 0 be given, and 
let U,, be the open set where the continuous function f, is < €. The pointwise 
decrease implies that the U,, are increasing with n, and the limit of 0 implies that 
each x in K is in some U,,. Thus the open sets U,, form an open cover of K. By 
compactness, there is a finite subcover. Since the sets U, are increasing, some 
particular Uy covers K. Then Il fll sup <eforn>N. 

The final step of preparation is to observe some properties of compact and open 
sets. A bounded subset of X is said to be a G; if it is the countable intersection of 
bounded open sets. It is said to be an F, if it is the countable union of compact 
sets. We shall be especially interested in compact G5’s and in open bounded F,’s. 


Lemma 11.5. Let f be a member of Cgom(X) with values in [0, 1]. Ifr > 0, 
then the set where f is > r is a compact Gs. If r > 0, then the set where f is 
> r is a bounded open F,. 


PROOF. The set where f is > r is closed because of continuity, and this closed 
set is a subset of the compact support. Hence the set is compact. Similarly the 
set where f is > r is open because of continuity, and this open set is a subset of 
the compact support. Hence the set is bounded. 
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When r > 0, the set where f is > r is the union, for n > 1, of the sets where 
fis>rt+ 1. For r > O when N is large enough so that r — x > O, the set 
where f is > r is the intersection, forn > N, of the sets where f is > r — 1. 
The lemma follows. 


Lemma 11.6. 


(a) If K is a compact Gs, then there exists a decreasing sequence of bounded 
open sets U;, such that U, > Uo ay for alln and NU, = K. 
(b) If U is a bounded open F,, then there exists an increasing sequence of 


compact sets K, such that K, C K?,, for all nm and U2 | K, = U. 


PROOF. For (a), let {V;} be a sequence of bounded open sets with intersection 
K. This is possible since K is a Gs. Without loss of generality we may assume 
that the V,, decrease with n. We define the sequence {U,,} inductively on n. Put 
U, = V. If U, has been constructed, use Corollary 10.22 to find an open set V/ 
such that K C V/ and viol C U,, and then define U,4; = V; 9 Vi41. Then the 
sets U,, have the required properties. 

For (b), let {L,,} be a sequence of compact sets with union U. This is possible 
since U is an F,. Without loss of generality we may assume that the L,, increase 
with n. We define the sequence {K,,} inductively onn. Put K; = L,. If K,, has 
been constructed, use Corollary 10.23 to find an open set V/ such that U D V/*! 
and V/ > K,. The compact set Li, = V/‘! has (L/,)? > V/. If we define 
Kn41 = Li, ULn41, then the sets K, have the required properties. 


Lemma 11.7. 


(a) If K is acompact G;, then there exists a decreasing sequence of functions 
Fn in Ccom(X) with values in [0, 1] such that each f,, is 1 on some neighborhood 
of K and lim f, = Ix pointwise. 

(b) If U is a bounded open F,, then there exists an increasing sequence of 
functions f, in Ceom(X) with values in [0, 1] such that each f,, has compact 
support contained in U and lim f,, = /y pointwise. 


PROOF. For (a), apply Lemma 11.6a to choose a sequence of bounded open 
sets U, with intersection K such that U, > U®,, for all n. Using Corollary 
10.44, let g, be a member of Ceom(X) with values in [0, 1] such that g, is 1 on 
OF and is 0 off U,,, and put f,, = min{g),..., g,}. Then the functions f,, have 
the required properties. 

For (b), apply Lemma 11.6b to choose a sequence of compact sets K,, with 
union U such that K, C K?,, for all n. Using Corollary 10.44, let g, be a 
member of Ccom(X) with values in [0, 1] such that g, is 1 on K,, and is O off 
KP 41, and put Jn = max{gi,..., @n}. Then the functions f, have the required 
properties. 
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Now we begin the proofs of the three identities in Section 1. If K is compact, 
let ; 
w(K) = inf ef), 


the infimum being taken over all f in Coom(X) such that f > Ix. Since 
£(min{ f, 1}) < (Cf), there is no harm in considering only those f’s taking 
values in [0, 1]. It is immediate from this definition and the positivity of £ that yu 
is nonnegative and monotone in the sense that K’ C K implies w(K’) < w(K). 
The next lemma is the key to being able to prove the three identities in Section 1. 


Lemma 11.8. If K is a compact subset of X, then the infimum of €(f) over 
all f in Coom(X) such that f > Ix equals the infimum of €(f) over all f in 
Ccom(X) with values in [0, 1] such that f > Jy for some neighborhood N of K 
depending on f. 


REMARK. In particular, j4(K ) can be computed by using only functions f > Ix 
that are equal to 1 in some neighborhood of K. 


PROOF. The problem is to show that the first infimum /; is not less than the 
second infimum /7. Let € > 0 be given. Choose f in Coom(X) with values in 
[0,1] such that f > x and €(f) < I; +, and let L be the set where f is 
> 1. Lemma 11.5 shows that L is a compact Gs, and Lemma 11.7a produces a 
decreasing sequence of functions f, in Ceom(X) with values in [0, 1] such that 
each f,, is 1 on some neighborhood of LZ and lim f,, = J, pointwise. Then the 
sequence {max{ f;,, f}} is pointwise decreasing with limit max{/,, f} = f, and 
hence {max{f,, f} — f} is a pointwise decreasing sequence in Cogm(X) with 
limit 0. By Dini’s Theorem (Lemma 11.4), the sequence {max{f,, f} — f} 
converges uniformly to 0, and hence €(max{ f,,, f}) decreases to £(f). For some 
sufficiently large no, we therefore have €(max{f,,, f}) < / +2e. The function 
max{ f,,, f} is one of the functions that figures into J), and thus Jy < I + 2€. 
Since € is arbitrary, In < 1). 


Lemma 11.8 puts us in a position to prove identity (i) in Section 1 and to deduce 
that jz extends in a well-defined fashion to a nonnegative additive set function on 
K(X). We make use of the formula a + b = min{a, b} + max{a, b}, from which 
it follows that a = min{a, b} + (max{a, b} —b). 


Lemma 11.9. If K; and K> are any two compact subsets of X, then 
M(K1) + (Kz) = w(K) U Ko) + w(K, 29 K)). 


REMARK. The argument in Lemma 11.8 adapts to give a quick proof of the 
present lemma when X is a metric space. In the metric case we can find a 
decreasing sequence { f,,} of functions < 1 in Coom(X) with pointwise limit /x,. If 
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f > Ix,, then the proof of Lemma 11.8 shows that f, f converges uniformly 
to f and hence €(f, f) decreases to €(f). It follows that €(f,,) decreases to 
i4(K,) whenever f, decreases to Ix,. If we similarly choose {g,} decreasing 
to Ix, and choose, by Corollary 10.44, a function g € Coom(X) with values in 
[O, 1] that is identically 1 on the support of f; + g1, then the formula stated just 
above shows that fn + gn = min{ fn + gn, g¢} + (max{ fn + gn, ¢} — 9). The 
first term on the right side decreases pointwise to [x,Ux,, and the second term 
decreases to Ix, x,. Thus a passage to the limit in the formula ¢(f,) + €(gn) = 
€(min{ fn + 8n, P}) + e(( max{ fn + 8n, 9} — 9)) immediately yields the result 
of the present lemma. 


PROOF. Let f and g be functions in Cggm(X) with values in [0, 1] such that 
f > Ix, and g > Ix,, and choose, by Corollary 10.44, @ € Coom(X) with 
values in [0, 1] that is identically 1 on the support of f + g. Then we have 
f+g=min{f +g, o}+ (max{f + g, gv} — g). The first term on the right side 
is > [x,ux,, and the second term is > [x,,x,. Therefore 


e(f) + &(g) = €min{ f + g, y}) + €((max{f + g, y} — 9)) 
> w(K, U K2) + w(K, K2). 


Taking the infimum over f and then over g, we obtain 
M(K1) + WK) = “CK U Ko) + w(K) 9 Kp). 


For the reverse direction let F be a member of Ceom(X) with values in [0, 1] 
that is > Ix,Ux, and is equal to 1 at least on some open set U containing K; U K>. 
Similarly let G be a member of Cgom(X) with values in [0, 1] that is > Ix,qx, 
and is equal to 1 at least on some open set V containing K; 1 Ky. Lemma 11.8 
shows that F and G are the most general functions of a kind needed for the 
computation of j4(K, U K2) and w(K1 9 K2). The sets U and V have compact 
closure in X since they are subsets of the supports of F and G. Choose, by 
Corollary 10.44, @ € Ccom(X) with values in [0, 1] that is identically 1 on the 
support of F + G. Let Vo be an open set with KN Kyo © Vo © Ve C V. Then 
(Ky — Vo) 1 Ky = Kya NVS NK; C VON VG = @. So there exists an open set 
W such that Ky -Vp CWCWIC Ky. 

We define f and g to be members of Coom(X) having compact support con- 
tained in U and having with values in [0, 1] such that 


1 on K,, 
a cl 
0 on W“, 
1 on Ko, 


and g= 
0 on support(f) — V. 
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The functions f and g exist by Corollary 10.44 if it is shown that the closed sets 
K, and W“ are disjoint and the closed sets Ky and support(f) — V are disjoint. 
The sets K, and W“ are disjoint since wic Ky. For K> and support(f) — V, 
we observe that support(f) ¢ ((W"')°) C (W°) = WS C (K2 — Vo)S = 
Vo UKS CV UKS. Therefore 


(support(f) —V)N Kx C(VU KS) N VSN Ko 
We conclude that f and g exist. 
By inspection, f > Ix, and g > Ix,, from which f + g > Ix, +1x,. Then 
min{f + g,g}is 1 on K; U K2 and is 0 off U. Since F is 1 on U, we obtain 


min{ f + g,9} < F. (*) 


Since f +g > Ix, + Ix, = Ik,ux, + [k,nx,, the function max{ f + g, 9} — » 
equals f + g —1 on K; U Ko, and this in turn is < 1 everywhere. Let us see that 


max{f +g,g}-g<G (4) 


everywhere. The only points x at which (>) could possibly fail are those where 
G(x) < 1, hence points of V°. At such points the definition of g shows that 
f(x) +g) < 1. If also x is in U, then g(x) = 1 and we compute that 
max{ f(x) + g(x), g(x)} — g@&) = 1-1 = 0. Thus (*x) holds at points of 
UN V*. At points of USM V°, the equality f(x) = g(x) = O implies that 
max{ f(x) + g(x), g(x)} — g@&) = g(x) — v(x) = 0. Thus again (**) holds, 
and hence (+) holds at every point of V°, therefore everywhere. 
Addition of («) and (**) gives f + g < F + G everywhere. Therefore 


L(F) + &(G) = €F + G) > &(f +8) = (fF) + &(8) = w(Ki) + w(K2). 


Taking the infimum over F and then over G gives w(K, U Kz) + w(K, 1 Kp) = 
(t(K,) + (Kz) and completes the proof of the lemma. 


Lemma 11.9 yields by iteration a corresponding formula with the sum of n 
terms on each side. This extension of Lemma 11.9 is a computation in Boolean 
algebra involving no analysis at all—only the fact that the collection of compact 
sets is closed under finite unions and intersections. The details are carried out in 
the next lemma. 
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Lemma 11.10. If K;..., K, are compact subsets of X, then 


n n k 
Ducky = Yul UU (()k:)). 
l=1 k=1 lSij<-<ip<n j=l 
PROOF. The argument is by induction on, the base case of the induction being 
the case n = 2 that was settled by Lemma 11.9. Thus let n > 2, and assume the 
identity for the case n — 1. The inductive hypothesis gives 


n n—-1 k 
Yen = Yul Uo (()Ki)) + eK. (*) 
l=1 k=1 1<ijp<-<ig<n j=l 
We shall prove by induction onr > 1 that 
k 


Ya = Fn U ((\¥;)) 


1<ij <-<ig<n 


r n—-1 k 

so U_ (Aa) +0 U_ (A) 
l<ijp<-+<ip=n j=l k=r 1<iy<--<ig<n j=l 

the base case of this induction being r = 1, where this identity reduces to («). The 

proof for the case r = n will complete the inductive step for the outer induction 

and thereby will complete the proof of the lemma. To pass from r tor + 1 in the 

inner induction, the question is whether 


MU (Os) +0 U. (0) 


1<i, <-+-<i,=n <i) <-+-+<i,<n 


r r+l1 

=e U (Ae) +e U (1). 
l<ij<-<i-<n j=l lSij<-<ipyi=n j=l 

The union of the two sets on the left here is the first set on the right side. In view 

of Lemma 11.9, this formula will follow if it is shown that the second set on the 

right side is the intersection of the two sets on the left. The intersection of the 

two sets on the left side is equal to 


U (Am) a(AK)) 6 


1<i <---<i,=n, 
Si} <--<if<n 


A term in the union in this expression is an intersection of at least r + 1 of the sets 


Ki,..., Kn, the last of which is K,, namely the ones corresponding to indices 
i;,...,4, andi, =n. Every intersection of exactly r + 1 of the sets Kj,..., Kn 
occurs if the last one is K,, because we can take ij = ij,...,i--1 = i/_,. Any 


intersection of more than r + | sets is contained in one with exactly r + 1 sets, 
and thus (**) equals L),<;, <...<i,,,=n (ae K;,) , as asserted. 
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A further formality is the derivation from these results that p: extends in a well- 
defined fashion to a nonnegative additive set function on the ring K(X). Again 
no analysis is involved, only the one additional fact that the intersection of two 
sets of the form K — L with K and L compact is again of this form, specifically 
that (K — L)N (K'— L’) = (KN K')-(LUL’). 


Lemma 11.11. The set function jz extends in a well-defined fashion to a 
nonnegative additive set function on K(X) under the definition 


n(s 2 )) = Yo (wk) — Hey )) 


whenever K; and L; are compact with L; C Kj foreach j with 1 < j <n and 
the sets K, — L1,..., Ky, — Ly are pairwise disjoint. 


REMARKS. Lemma 11.2 assures us that every member of K(X) is of the form 
in this lemma. The subtlety of the lemma arises from the fact that the sets K; 
need not be disjoint. 


PROOF. First let us see that jz is well defined in the case 7 = 1, ie., that 
K'—L' = K —L with L'’ C K’ and L C K implies w(K’) — w(L’) = 
p(K) — w(L). We are to show that w(K’) + uw(L) = w(K)+ p(L’), and Lemma 
11.9 shows that it is enough to show that K’ UL = KUL’ and K’NL= KNOL’. 
Suppose x is in K'UL. If x is in L, then x isin K, hence in K UL’. If x is in K’ 
instead, then either x has to be in L’ in the case that x is not in K’ — L’ or x has 
to be in K in the case that x isin K'’— L'’= K —L.SoK’ULC KUL’. Ifx 
isin K'N L, then x is not in K — L and must be in L’ in order to avoid being in 
K'—L'.SoxisinLUL' C K UL’. Reversing the roles of K’— L' and K — L, 
we see that K’ UL = K UL’and K'NL=KONL’. 

Next suppose that K’ — L’ = Uj- , (Kj = E;) with. L* C.K", S-K; for 
each j, and the sets K; — L; disjoint. We are to show that w(K’) — w(L’) = 
TI UK) — WL;))vie.,that w+ IN why) = WL + D2 w(K). 
The argument will generalize that in the previous paragraph: The set K’ — L’ has 
complement L’ U K’°, and therefore the given condition of disjointness means 
that 


X = (L'UK") UJ (Kj) - Ly) (x) 
jal 


disjointly. Put L,4; = K’ and K,4, = L’, so that we are asking whether 


n+1 n+l 


S- whi) = D> w(K). 
j=l j=l 


2. Riesz Representation Theorem 499 


In view of Lemma 11.10, it would be enough to show that 


k 
U (( ) Li) > U (( ) Ki) 
1Sij <-<igg35nt+l] j=l 1 Si) <-<igg4<nt+l] j=l 

for 1 <k <n+1. The left side is the set of x lying in at least k of the sets 
L;, and the right side is the corresponding set for the K;’s. Thus it is enough to 
prove that the set of x lying in exactly r sets K; is contained in the set of x lying 
in exactly r sets L;,forl <r <n+1. 

We check this condition separately for the three cases x € L’,x ¢ K', and 
x € K’ —L’. From () we see that x in L’ U K”° implies that x is not in any 
K; — L; for 1 < j <n. Hence for the first two cases, x isin L; withl <j <n 
if and only if x is in K;. 


Case 1. x € L’. For x to be inr of the sets Kj,..., Kn41,x must be inr — 1 
of the sets K,,..., K,, hence inr — 1 of the sets Lj,..., L,. Since x isin L’, it 
isin K' = L,4,. Therefore x is inr of the sets L1,..., Ln41- 

Case 2. x ¢ K'. For x to be inr of the sets Kj, ..., Ky41, x must be inr of 
the sets K,,..., K,, hence inr of the sets L;,..., L,. Since x is not in K’, it is 
not in L,+,. Therefore x is inr of the sets L;,..., Ln41. 

Case 3.x € K'—L’. Since x is not in L’U K’*, (*) shows that x is in exactly 
one K; — L; with 1 < j <n. For x to be inr of the sets Ky, ..., Kn41, x must 
be in r of the sets Kj,..., Kn, hence inr — 1 of the sets L1,..., Ly. Since x is 
in K’ = Ly41, itis inr of the sets L1,..., Ln4i. 


For the general case, suppose that LJ", (Kj — Li) = Uj_, (Kj — L;). Inter- 
secting both sides with K; — L’, we obtain 


K}— Li =|(J (Kj NK) — (Lj UL) N (Kj NK). 
j=l 
The case just proved shows that 


n 


w(K; — Li) = a (u(Kj 1 Kj) — w(L; U L)) 9 (Kj 9 K;))) 


j=l 
and hence 
S> w(K} - Li) = (u(Kj 0 K)) — w((L; U Li) 9 (Kj K)))). 
i=l i=1 j=l 
Similarly 


> w(K - L)) = (u(Kj 1K) — w(L; UL) 9 (Kj K%))). 
j=l fali= 


Therefore 7", w(K; — L') = )°"_, w(K; — Lj), and the proof is complete. 


l 


j=li 
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In short order, we can now prove identities (ii) and (iii). Lemma 11.12 will 
prove (iii), and Lemma 11.13 will prove (ii). 


Lemma 11.12. If U is any bounded open subset of X, then 


sup &(g) = aw uw(K)= sup ¢(f). 


&€Coom (Xx), = fEeCcom (XxX), 
O<g<ly, K compact O0<f<lu 
support gcU 


PROOF. Let 51, 52, S3 be the three suprema in question. We first check that 
S| < S2 < S3. If g contributes to S$}, then g < support eg < Ju. If hh € Coom(X) 
has Isupport g < 2, then g < h and hence ¢(g) < ¢(h). Taking the infimum over 
all such h, we obtain ¢(g) < (support g) < Sy. Taking the supremum over all 
g therefore gives S$; < Sy. Next if K is compact with K C U, Corollary 10.44 
allows us to find f € Coom(X) with values in [0, 1] such that f is equal to 1 on K 
and equal to 0 on U*. Then Ix < f < Iy. The definitions of z(K) and $3 yield 
UK) < €(f) < S3. Taking the supremum over all K therefore gives Sz < $3. 

To complete the proof, we show that S; > $3. Let € > 0 be given. Choose 
f in Ccoom(X) such that 0 < f < Jy and €(f) = S3 — €, and let V be the set 
where f is > 0. Lemma 11.5 shows that V is a bounded open F,, and Lemma 
11.7b produces an increasing sequence of functions f, in Ccom(X) with values 
in [0, 1], each with support some compact subset of V, such that lim f, = ly 
pointwise. Then the sequence {min{f,,, f}} is pointwise increasing with limit 
min{/y, f}. If x is a point where Jy(x) < f(x), then f(x) > 0, x isin V, 
and Jy(x) = 1, contradiction. So there is no such point, and min{/y, f} = f. 
Therefore the sequence {f — min{f,, f}} is a pointwise decreasing sequence 
in Coom(X) with limit 0. By Dini’s Theorem (Lemma 11.4), the sequence 
{f —min{ f,, f}} converges uniformly to 0, and hence €(min{ f,, f}) increases to 
£(f). For some sufficiently large no, we therefore have €(min{f,,, f}) => S3—2e. 
The function min{f,,, f} is one of the functions that figures into $;, and thus 
S; > €(min{ fr, f}) = S3 — 2e. Since ¢ is arbitrary, S; > $3. 


Lemma 11.13. Let jz be extended to a nonnegative additive set function on 
K(X) as in Lemma 11.11. If U is a bounded open subset of X, then w(U) = 


SUPKcu, K compact U(K) . 


PROOF. For the bounded open set U, let S;, S2, S3 be the three equal suprema 
of Lemma 11.12. By definition, u(U) = uw(L) — w(L — U) for any compact 
set L containing U, and we are to prove that w(U) = Sy. If K is a compact 
subset of U, then K U (L — U) is a disjoint union contained in L, and we have 
w(K) + w(L — U) = w(K U(L — U)) < (ZL). Taking the supremum over all 
such K , we obtain Sy + uw(L — U) < p(L), Le., So < w(U). 
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Let h be any member of Ceom(X) with values in [0, 1] such that h > J;_y and 
such that / is 1 on an open neighborhood N of L — U. Then L C N UU. For 
each point x of U, find an open neighborhood U, of x with US! C U. Then N 
and the U,’s form an Lae cover of L, and there is a finite subcover. Let us say 
that L C NUU,, U---UU,,. The set K = UG U -U UC i is a compact subset 
of U,andLCNUK. Choose. by Corollary 10.44, a finction Ff € Coom(X) 
with values in [0,1] such that f is 1 on K and is 0 off U. This function has 
0 < f < Iy. Since f is 1 on K andhis 1onN,h+ f is > 1 on L. Hence 
M(L) < Ch + f) = &h) + &(f) < €(h) + $3. Thus w(L) < w(L — U) + $3 
and w(U) < $3. Since $3 = Sy by Lemma 11.12, w(U) = S2 as required. 


PROOF OF EXISTENCE IN THEOREM 11.1. If K is compact, we define j(K ), just 
as we did earlier in this section, to be the infimum of €(f) over all f in Coom(X) 
such that f > Ix. Lemma 11.11 shows that jz extends, necessarily in a unique 
fashion, to a well-defined nonnegative additive set function on K(X). 

Consider the set C of all members E of K(X) satisfying the following regularity 
property: for each € > O, there exist compact K and open bounded U with 
K CE CU and wUU — K) < €. Lemma 11.13 shows that every open bounded 
set is in C. We show closure of C under finite unions. If E; and £2 are in C, then 
we can choose K; and Kz compact and U, and U2 bounded open such that K; C 
Ey Cc U,, Ko Cc E> Cc U2, UU, = K)) < €/2, and w(U2 = K>) <— €/2. Then 
K,UK, € E,UE) C U, UU) and (U; UU2)—(K, UK) © (U; —K)U(U2— Ky). 
It follows that w~((U; U U2) — (Ki, U K2)) < w((U1 — K1)) + u((U2 — K2)) < €, 
and C is closed under finite unions. 

We show closure of C under differences. If FE; and E> are in C, then we again 
choose K; and Kz compact and U; and U2 bounded open such that K; C FE; C U4, 
Ko © Ex C U2, WU, — Ky) < €/2, and w(U2 — K2) < €/2. Then Ky — U2 © 
E, — Ey, C U, — Ko, and (U; — K2) — (Ki — U2) © (U, — Ki) U (U2 — K2). 
Hence w((U; — K2) — (K; — U2)) < w(U; — Ki) + w(U2 — K2) < €, and C 
is closed under differences. By Lemma 11.2, C equals K(X). Thus every set in 
K(X) satisfies the regularity property. 

Next let us see that jz is completely additive on C. Let E,, be a disjoint sequence 
of sets in ee with union E in K(X). For every N, we have sae , H(E,) = 
WE; U---U En) < w(E). Hence ye —, H(En) < mE). For the reverse 
inequality, lete > 0 be given. Choose, by the regularity property, K compact and 
U,, open bounded with K C E,E, CU,,u(E—K) < €,anduw(U,—E,) < €/2". 
Then K CE =|(J™, E, CU, Un. In other words, the sets U,, form an open 
cover of the compact set K. Some finite subcollection is a cover, and thus 
K CU,U---UUn for some N. Then we have 


M(E) = WE — K)+ eK) S€+uU(U, U--- UU) 
<6 + yy Wn) <6 + OY, (UEn) + €/2”) < 2, WEn) + 2¢. 
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Since ¢ is arbitrary, w(E) < °°, w(E,). Therefore u(E) = S°°°, w(E,), and 
pt is completely additive on K(X). 

The Extension Theorem (Theorem 5.5) shows that jz extends uniquely to a 
measure on the smallest o-ring containing K(X), i.e., the o-ring of o-bounded 
Borel sets. Proposition 5.37 shows further that jz extends canonically to a measure 
on the o-algebra of all Borel sets under the definition 


M(E) = sup (PF). 
FCE, FeB(X), 
F o-bounded 


This defines w on B(X). We are left with showing that jz is regular and that 
Lf) = fy f du for every f € Ccom(X). 

In showing that €(f) = Sz f dp for every f € Coom(X), itis enough to handle 
an arbitrary f > 0. Fix € > 0, and fix an integer N such that IF seg < Ne. 
For 0 <n < N, define f, = min{f, ne}. Each f,, is in Coom(X), the function 
fo is 0, and the function fy is f. ForO <n < N, define g, = fri — fn. 
We can recover f from the g,’s as f = ye gn. Forn > 1, define K, = 
{x | f(x) => ne}, and let Ko = support(f). All the sets K, are compact, and 


they decrease in size with n. In this notation the formula for g, is 


0 ifx ¢ Ky, 
En(xX) = 4 f(x) — ne ifx € Ky — Kn41, 
€ if x e Knsi- 
Consequently €lxg S an = elx,: (+) 
Integration therefore gives 
€W(Knsi) S fy 8nd S €W(Kn). (+) 


The inequality given as Ix,,, < e!g, in () implies that 4(Kn41) < €! (gn). 
The other inequality eo gh, < Ik, in (*) says that any h € Coom(X) with Ix, <h 
has e~!g, <h. Taking the infimum over / yields €—!€(gn) < u(K,). Thus we 
have 

€M(Knsi) < €(8n) < €M(Ky). (+7) 


Subtracting (+) and (7), we obtain 
—€(ut(Kn) — w(Kn41)) = fy onde — Len) = (UK) — (Kn): 
Since f = Re &n, Summing from n = 0 ton = N — | gives 


fy fdu —L(f)| <€ ee (Kn) — e(Kn41)) = €(support(f)). 
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Since ¢ is arbitrary, | fy f du — €(f)| =0. Thus (f) = f. f du. 

Fix a compact subset Ko of X, form the o-ring B(X) M Ko, and let A(Ko) 
be the collection of members FE of B(X) M Ko such that w(£) is the supremum 
of w(K) over all compact subsets K of FE and w(£) is the infimum of w(U) 
over all bounded open sets in X that contain E; the open sets in question need 
not lie within Ko. Since the sets in A(Ko) all have finite measure, the regularity 
condition on F is that there exist, foreach e > 0, K compact and U bounded open 
with K C EF CU andyu(U—K) < €. The same arguments as at the beginning of 
the present proof show that A(Ko) is closed under finite unions and differences. 
To see closure under countable disjoint unions, let {E,,} be a disjoint sequence 
in A(Ko) with union E, let € be given, and choose K, compact and U,, bounded 
open with K, C E, C U, and w(U, — Ky) < €/2”. Applying Corollary 10.23, 
let L be a compact subset of X with Ko C L°. The sets K,, are disjoint, and 


thus )°°° , w(Kn) converges. Choose N such that °° y., (Kn) < €. Define 
U = L°O Un Un, K = Wei Kns Koo = Una Kn, and F = Ur yas Kn: 
Then K is compact, U is bounded open, and K C E CU. Since Ky, = K UF, 


we have 


w(U — K) < WU — Ka) + HF) < n(U Ur ~ kn) +n is Ky) 
n=1 


n=N+1 
CO ie,2) CO 

<7 HUn- Kn) + DY w(K) SD €/2" +e = 24. 
n=1 n=N+1 n=1 


Thus (Ko) is closed under countable disjoint unions and is a o-ring. Since the 
compact subsets of Ko are in A(Ko), we conclude that A(Ko) = B(Ko). 

This proves regularity for all bounded sets. If E is o-bounded, we can choose 
an increasing sequence {L,,} of compact sets whose union contains F. Put E,, = 
EL,. Given € > 0, we apply the previous step to choose K, compact and 
U,, bounded open such that K, C E, © Uy, and w(Un — Kn) < €/2”. Taking 
U =U, Unand Koo = UP, Kn, wehave Koo C E C Uand u(U—Ko) < €. 
Thus w(U) < w(E)+€, and w(E) < w(Koo) +€. The first of these inequalities, 
being possible for any €, shows that ;1(£) is the infimum of the measures of open 
o-bounded sets containing EF. Since (Koo) = limy i (ey Kn) by complete 
additivity, the second of these inequalities, being possible for any €, shows that 
L(E) is the supremum of the measures of compact sets contained in E. 

This proves regularity for all o-bounded sets. If FE is a Borel set that is not 
o-bounded, we know that w(E) is the supremum of the measures of j4(F) for 
o-bounded Borel subsets F of E, and we know that (F) is the supremum of the 
measures of 4(K) for compact subsets K of F’. Therefore j1(£) is the supremum 
of the measures of j4(K) for compact subsets K of E. This completes the proof 
of regularity of ju. 
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PROOF OF UNIQUENESS IN THEOREM 11.1. Let jz be the constructed measure, 
and let v be a second measure satisfying the properties of the theorem. The 
assumed regularity of v implies that it is enough to prove that v(K) = w(K) for 
every compact subset K of X. Fix K, and let a be the infimum defining w(K), 
namely the infimum of €(f) over all f € Ccom(X) with values in [0, 1] such that 
Ik < f. Integrating this inequality with respect to v, we see that v(K) < J, fdv 
and therefore v(K) < a. Suppose that v(K) < a. By Corollary 10.23 and the 
assumed regularity of v, we can find a bounded open set U with U D> K and 
v(U) <a. By Corollary 10.44 we can find a function g € Coom(X) with values 
in [0, 1] such that g is | on K and is O off U. Then Ix < g < ly. Hence 
lg) = fygdu = fy gdv < fy lydv = VU) <a < ¢(g), and we obtain a 
contradiction. We conclude that v(K ) = a = w(K), and the uniqueness follows. 


3. Regular Borel Measures 


The fact that compact sets for a general locally compact Hausdorff X need not be 
countable intersections of open sets suggests a look at the ring of sets generated 
by the compact sets that are indeed such intersections, as well as the associated 
o-algebra. The sets in this o-algebra are known as “Baire sets,” and it turns 
out that the members of Ceom(X) are measurable with respect to this o-algebra. 
The o-algebra of Baire sets can be strictly smaller than the o-algebra of Borel 
sets, and thus one can make a case for limiting oneself to Baire sets all along. 
This would be a fine point, one not worth pursuing here, but for one fact: the 
o-algebra of Baire sets for X x Y is acorrect o-algebra to use in Fubini’s Theorem 
for changing iterated integrals over X and Y to a double integral—and this may 
not be true when Borel sets are used. 

This fact about Fubini’s Theorem might seem to be a telling argument for 
replacing Borel sets by Baire sets everywhere in the theory. The difficulty is that 
it is a little tedious to check constantly whether sets are Baire sets —for example, 
whether one-point sets are Baire sets. Thus the normal practice is to work with 
Borel sets and to resort to Baire sets only when Fubini’s Theorem comes into play 
in a way that makes the distinction important. The most frequent case that arises 
in applications of Fubini’s Theorem in this theory is that a function on X x Y 
is continuous with compact support, in which case only Baire sets are involved 
anyway. 

Thus let X be a locally compact Hausdorff space. The sets in the smallest 
o-algebra B(X) containing the compact sets are the Borel sets, and the sets in 
the smallest o-algebra Bo(X) containing the compact G;5’s are the Baire sets. 
Measurable functions in the first case will be called Borel measurable functions 
or Borel functions, and measurable functions in the second case will be called 
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Baire measurable functions or Baire functions. We shall observe in Corollary 
11.16 below that every member of Coom(X) is a Baire function. 

If the locally compact Hausdorff space X is a metric space, then any closed 
set F is the intersection of the sets U, = {x | D(x, F) < 1}, where D(-, F’) is 
the distance to the set F. Consequently every compact subset of X is a Gs, and 
every Borel set is a Baire set. 


Proposition 11.14. If K and U are subsets of X with K compact, U open, 
and K C U, then there exist a compact Gs, say Ko, and an open bounded F,, 
say Ug, such that K C Up C Ky CU. 


PROOF. Choose by Corollary 10.44 a member f of Coom(X) with values in 
[0, 1] such that f is 1 on K and isO on U*. If Ko is the set where f is > 5 and 
Uo is the set where f is > 5s then Lemma 11.5 shows that Ko and Up have the 
required properties. 


Corollary 11.15. Any o-compact open subset of X is a Baire set. 


Proor. If U = UJ, K, is open with each K, compact, we can apply 
Proposition 11.14 to the inclusion K,, C U and find a set (K,,)o that is a compact 
G;andhas K, C€ (K,)o C U. ThenU = Ue (Kno exhibits U as the countable 
union of compact G;5’s, hence as a Baire set. 


Corollary 11.16. Every member of Coom(X) is a Baire function. 


PROOF. This is immediate from Lemma 11.5 and Corollary 11.15. 


Proposition 11.17. If X and Y are o-compact, then the product o-algebra for 
X x Y obtained from the Baire sets of X and Y is the o-algebra of Baire sets of 
Xx/Y. 


PROOF. If Ky and Ky are compact G;’sin X and Y , then Ky x Ky isacompact 
G; in X x Y, and it follows that By)(X) x Bo(Y) C Bo(X x Y). For the reverse 
inclusion let K be a compact Gs in X x Y, and write K as K = Psy U,, with 
each U,, open. We construct open sets S, in Bo(X) x Bo(Y) with K C S, C Un, 
and then it follows that K = ()--., Sn and K is a Baire set. 

To do so, it is enough to show that if K C W with W open, then there is an 
open set S in By(X) x Bo(Y) with K C S C W. For each (x, y) in K, find open 
neighborhoods U, of x and V, of y such that U, x V, C W. Proposition 11.14, 
applied to the inclusion {x} C U,. and then to the inclusion {y} C V,, shows that 
we may assume that U, and V, are open F,,’s. In view of Corollary 11.15, they 
are then Baire sets. Hence U, x Vy is in Bo(X) x Bo(Y). As (x, y) varies, the 
sets U, x V, form an open cover of K, and there is a finite subcover. We can 
take S to be the union of the elements in the finite subcover, and then S has the 
required properties. 
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Now we turn our attention to measures. A Baire measure on X is a measure 
on the Baire sets that is finite on every compact Gs. The restriction of a Borel 
measure to the Baire sets is a Baire measure. We are going to prove that Baire 
measures are automatically regular in the same sense that Borel measures in R™ 
are automatically regular. 


Proposition 11.18. Every Baire measure jz is regular in the following sense: 


ME) = sup w(K) for every set E in Bo(X), 
K cana Gs 

WE) = ae WU) for every o-bounded set E in By(X). 
U open Fy 


REMARK. Since Baire sets and Borel sets are the same in a metric space, 
this proposition generalizes the known regularity of Borel measures on any open 
subset of R”, as given in Theorem 6.25. 


PROOF. If ZL is a compact Gs, then (ZL) is certainly the supremum of w(K) 
for the compact Gs’s contained in L. Suppose that U is o-bounded open with 
L CU. Proposition 11.14 produces a bounded open set Up that is an F, and has 
L CU, CU. Consequently j(L) is the infimum of ~(Uo) for the open F,’s 
containing L. Thus every compact G; satisfies the stated regularity condition. 

The remainder of the proof runs parallel to the proof of regularity at the end 
of the proof of existence for Theorem 11.1, and we shall be brief. Fix a compact 
Gs in X,say Ko. Form the o-ring By(X) MN Ko, and let Ao(Ko) be the collection 
of members FE of Bo(X) M Ko such that (£) is the supremum of w(K) over all 
compact subsets K of FE that are Gs’s and yz(£) is the infimum of 4(U) over all 
open supersets U of E that are F,,’s; the open sets in question need not lie within 
Ko. Since the sets in Ap(Ko) all have finite measure, the regularity condition on 
E is that there exist, for each € > 0, K compact and U open of the correct kind 
with K C E CU and w(U — K) < €. The same arguments as earlier show that 
Ao(Ko) is closed first under finite unions and differences, then under countable 
disjoint unions. Thus Ao(Ko) is a o-ring containing all compact G3’s, and we 
conclude that A(Ko) = B(Ko). 

This proves regularity for all bounded Baire sets. If the Baire set E is 
o-bounded, we can choose an increasing sequence {L,,} of compact Gs’s whose 
union contains FE. Put FE, = E L,. Then the same argument as earlier, using 
the sets E,,, shows that the regularity condition holds for EF. 

Finally if E is a Baire set that is not o-bounded, we know that j1(£) is the 
supremum of the measures of j4(F’) for o-bounded Baire subsets F of E, and we 
know that j1(F) is the supremum of the measures of j4(K ) for compact subsets 
K of F that are Gs’s. Therefore ;1(£) is the supremum of the measures of w(K) 
for compact subsets K of E that are G5’s. 
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Proposition 11.19. If v is a Baire measure on X, then there is one and only 
one regular Borel measure 4 on X whose restriction to the Baire sets is ju. 


PROOF. Since the members of Ccom(X) are Baire functions (Corollary 11.16), 
we can define a positive linear functional £ on Coom(X) by £(f) = 2 y f dv. The 
uniqueness of the extending jz follows from the uniqueness part of Theorem 11.1. 
For existence we take ju to be the regular Borel measure given by the existence part 
of Theorem 11.1. We are to prove that jz and v agree on Baire sets. The measures 
jt and v agree on compact Gs’s by Lemma 11.7a and dominated convergence. 
By regularity of Baire measures (Proposition 11.18), jz and v agree on all Baire 
sets. 


Proposition 11.20. Suppose that X is compact and that yw and v are Borel 
measures on X with yw regular. If v is absolutely continuous with respect to ju, 
then v is regular. 


PROOF. Let € > 0 be given. The Radon—Nikodym Theorem (Theorem 9.16) 
and Corollary 5.24 together show that there exists 6 > O such that any Borel 
set A with w(A) < 6 has v(A) < e. Let E be a Borel set to be tested for 
regularity under v. Since yz is regular, we can choose K compact and U open 
with K C E CU and w(U — K) <6. Then v(U — K) < «, and it follows that 
v(E) is approximated within € by v(K) and v(U). 


Proposition 11.21. If jis a regular Borel measure on X and if 1 < p < ©, 
then 
(a) Coom(X) is dense in L?(X, 2), 
(b) the smallest closed subspace of L?(X, jz) containing all indicator func- 
tions of compact Gs’s in X is L?(X, jx) itself. 


REMARK. This generalizes conclusions (a) and (b) of Proposition 9.9 from 
open subsets of R% to all locally compact Hausdorff spaces. 


PROOF. If E is a Borel set of finite 4 measure and if € is given, the regularity 
of jz allows us to choose a compact set K with K C E and w(E — K) < «. 
Then we can find a bounded open set U with K C U and w(U — K) < €, and 
Proposition 11.14 gives us a compact Gs set Ko such that K C Ko C U. We have 
Jw We — Ik|? du = WE — K) < €, few Wu — Ik|? du = wU — K) <e, 
and Spx Wu — Ik,|?du = w(U — Ko) < €. Consequently we see in succession 
that the closure in L’(X, 2) of the set of all indicator functions of compact sets 
contains all indicator functions of Borel sets of finite 42 measure, the closure in 
L?(X, 2) of the set of all indicator functions of bounded open sets contains all 
indicator functions of Borel sets of finite 44 measure, and the closure in L?(X, 2) 
of the set of all indicator functions of compact G;3’s contains all indicator functions 
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of Borel sets of finite 2 measure. Proposition 5.56 shows consequently that 
the smallest closed subspace of L?(X, 4) containing all indicator functions of 
compact Baire sets is L?(X, jz) itself. This proves (b). 

For (a), let Ko be a compact Gs, and use Lemma 11.7a to choose a decreasing 
sequence { f,} of real-valued members of Ceom(R™) with pointwise limit [x,. 
Since cee is integrable, dominated convergence yields lim, San lfn—Tk |? du = 
0. Hence the closure of Coom(X) in L?(X, «) contains all indicator functions 
of compact G»5’s. By Proposition 5.55d this closure contains the smallest closed 
subspace of L?(X, jz) containing all indicator functions of compact G35’s. Con- 
clusion (b) shows that the latter subspace is L? (X, j1) itself. This proves (a). 


Corollary 11.22. Suppose that X is a locally compact separable metric space. 
If yz is a Borel measure on X and if 1 < p < oo, then 


(a) Coom(X), as anormed linear space under the supremum norm, is separa- 
ble, 
(b) L?(X, wz) is separable. 


REMARK. This generalizes Corollary 6.27c and Proposition 9.9c from open 
subsets of IR to all locally compact separable metric spaces. The measure 
pt is automatically regular by Proposition 11.8 since Baire measures and Borel 
measures coincide in any locally compact metric space. 


PROOF. Part (a) is proved by the same argument as for Corollary 6.27c. What 
is required is a substitute for Lemma 6.22a in order to obtain a sequence {F,}°° | 
of compact subsets of X with union X such that F, C F°? 41 for all n. It was 
observed at the beginning of Section X.3 that separable implies Lindel6f, and it 
follows from Proposition 10.24 that X is consequently o-compact. Application 
of Proposition 10.25 then gives the sequence {F,}°° ,. Corollary 2.59 is still to 
be applied to C (F,,); since F;, is a compact metric space, the corollary shows that 
C(F,,) is separable, and the argument goes through. 

Part (b) follows from (a) and Proposition 11.21ain the same way that Corollary 
6.27d follows from parts (a) and (c) of that corollary. The sequence {F,}°° , of 
the previous paragraph is to be used in the argument. 


Theorem 11.23 (Helly—Bray Theorem). Let X be a locally compact separable 
metric space. If {j1,} is a sequence of Borel measures on X with {1,(X)} bounded, 
say by M, then there exist a Borel measure jz on X and a subsequence {jn} such 
that u(X) < M and lim, fy f dun, = fy f du for all f in Coom(X). 


REMARKS. In the terminology of Section V.9, the measures jz, are continuous 
linear functionals on the normed linear space Coom(X), and the norm of the 
linear functional corresponding to [yn is U»(X). The convergence is weak-star 
convergence, and the limiting linear functional is given by a Borel measure yz 
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with 4(X) < M. The theorem amounts to an application of the preliminary 
form of Alaoglu’s Theorem (Theorem 5.58) and the identification of the limit as 
a measure. 


PROOF. The proof consists of filling in the details in the remarks above. 
We regard Y = Coom(X) as a normed linear space with the supremum norm. 
Any Borel measure v on X defines by integration a linear functional on Y with 
norm given by ||v|| = supyec.,..(x), Ifll<l ty f dv. The right side is certainly 
< || fllsupv(X). In the reverse direction, let {K,} be an increasing sequence 
of compact subsets of X with union X, so that lim, v(K,) = v(X). Choose 
functions f, : X — [0, 1] in Ccom(X) by Corollary 10.44 such that f, is 1 on 
K,. Then ||fnllsup < 1 for all n, and fy frdv = Sr, dv = v(K,). Hence 
|v || = lim sup, v(K,,) = v(X), and we conclude that ||v || = v(X). 

Thus the given sequence {jz,,} corresponds to a sequence in Y* with || {2 || < M 
for all n. Corollary 11.22 shows that Y is separable. Theorem 5.58 therefore 
applies and yields a subsequence {jz,,} and a member ¢ of Y* with ||¢€|| < M 
such that limy fy f dun, = €(f) for all f in Coom(X). If f = 0, lim, fy f dun, 
is certainly > 0, and thus @ is a positive linear functional on Coom(X). The Riesz 
Representation Theorem (Theorem 11.1) produces a Borel measure jz on X with 
Lf) = i= f dw for all f in Coom(X). Since || £|| < MW, we have u(X) < M. 
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We continue in this section with X as a locally compact Hausdorff space. We 
now change the point of view a little and regard Cgom(X) as a normed linear space 
under the supremum norm || f ll sip = SUPy¢x | f (x)|. The problem is to identify 
all continuous linear functionals on this normed linear space. We shall see shortly 
that it is enough to handle the case that X is compact. 

If X* is the one-point compactification of X , then two spaces to be considered 
in conjunction with Coom(X) are C(X*), the space of continuous scalar-valued 
functions on X*, and Co(X), the space of continuous scalar-valued functions on 
X that “vanish at infinity.’ When applied to a function f, the term vanishes at 
infinity means that for any « > 0, there is some compact set with the property 
that | f(x)| < € outside that set. It is equivalent to say that f extends to a member 
of C (X*) that is 0 at oo. 

The three spaces Coom(X), Co(X), and C(X*) are related. In the first place, 
Ccom(X) is dense in Co(X). In fact, if f is in Co(X) and if € > 0 is given, we find 
K compact with | f (x)| < € outside K. Corollary 10.44 supplies a member g of 
Ccom(X) with values in [0, 1] that is 1 on K. Then the product fg is in Coom(X), 
and || f — ilps < €. Thus Coom(X) is dense in Co(X). Any continuous 
linear functional on Cgom(X) is uniformly continuous by Proposition 5.57, and 
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Proposition 2.47 shows that it extends uniquely to a continuous linear functional 
on Co(X). Thus the continuous linear functionals on Co(X) and Ceom(X) are in 
one-one correspondence by restriction. 

If we identify Co(X) as the subspace of C (X*) of functions equal to 0 at oo, 
then every continuous linear functional on C (X%*) restricts to a continuous linear 
functional on Co(X). In the reverse direction every continuous linear functional 
on Co(X) extends (nonuniquely) to a continuous linear functional on C(X*). In 
fact, let £y be a continuous linear functional on Co(X), and fix a member fp of 
C(X*) with fo(oo) = 1. If f is any member of C(X*), then f — f (co) fo is in 
Co(X) and it makes sense to define €(f) = €9(f — f(co) fo). Since 


eA) = lof — f (00) fo)l < Moll f — £00) folleup 
< [€oll If llsup + Lf COIN follsup) < W€oll + Ul follsup IF llsup> 


£is bounded on C (X*) and is therefore continuous. Thus the study of continuous 
linear functionals on Ceom(X) reduces to the case that X is compact. 

The first result below shows that any continuous linear functional on C (X) with 
X compact is a finite linear combination of positive linear functionals. In view of 
Theorem 11.1, it is therefore given as a finite linear combination of integrations 
with respect to regular Borel measures. The remainder of the section will be 
devoted to making this result look tidier and seeing what happens to various 
norms under the correspondence. 


Proposition 11.24. Let X be a compact Hausdorff space, and let £ be a 
continuous linear functional on C(X). If & takes real values on real-valued 
functions, define, for f > 0 in C(X), 


ef) = ae) and €-(f) = €*(f) — £(f); 


O<g<, 


then £+ and €~ extend to positive linear functionals on C (X) such that £ = €+—¢-. 
If € does not necessarily take real values on real-valued functions, then ¢ is a 
complex linear combination of positive linear functionals on C(X). 


PROOF. The functions f and g in this argument will all be in C (X). For general 


£ not necessarily taking real values on real-valued functions, define L( fo =& f ). 
We readily check that é is a continuous linear functional on C(X), that 2p = 
5(e + €) and €; = xe — £) are continuous linear functionals on C(X) taking 
real values on real-valued functions, and that 2 = €r + i€; exhibits @ as a 
complex linear combination of continuous linear functionals taking real values 
on real-valued functions. This reduces the proposition to the case that ¢ takes real 
values on real-valued functions. 
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In this case, for f > 0, inspection gives the following: €(f) = €*(f)—£- (f), 
£*(0) = £-(0) = 0, é* (cf) = ce*(f) for c > 0, and £- (cf) = cé-(f) for 
c > 0. In addition, 2+ (f) > 0 for f > 0 because 


e*(f) = sup €(g) > f) =9, 
O<g<f 
and €-(f) > 0 for f > 0 because 


EPH (f-Kf= ee — £(f) = &(f) — £(f) = 0. 
=gs 


To complete the proof, all that we have to do is show that €*(f, + fo) = 
e+ (fi) + €* (fo) whenever f; > 0 and fy > 0. The argument for > is that 


e(fitf)= sup C(g)> sup ¢(g1 + g2) 


0<g<fithr 81,82, 
O<ei<fi, 
O<g2< fo 
= sup (gi)+ sup €(g2)=¢*(fi) + 7 (fa). 
O<gisfi 0<ao<fr 


For the reverse direction, let g be arbitrary with O < g 
gi = min{g, fi} and go = g — g,. Certainly 0 < g; < 
0 < go < fo. In fact, 


fi + fo, and set 
Let us show that 


< 
fi. 
&=es-—si=(gt+fi)—(fiteai) =max{g, fi} +min{g, fi} — (fi + a1) 


= max{g, fi} + g1 — (fi + a1) = max{g, fi} — fi. 


Thus go is certainly > 0. In addition, the computation 


g2=max{g,fi}—fismax{ith, fil-A=Ath-fA=h 


shows that gz is < fo. Thus any g withO < g < f+ fo gives us a corresponding 
decomposition 


L(g) = €(g1 + 82) = (a1) + £(g2) 
< sup €(gi)+ sup C(g2) = €*(fi) + C7 (fr). 
O<ai<hi 0<g2< fo 


Taking the supremum over g, we obtain € (fi + fo) < €*(f1) + €* (2), and the 
proof is complete. 
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Let us reinterpret matters in terms of Borel measures. We begin with the real- 
valued case. Recall from Section IX.3 that a real-valued completely additive set 
function p on ao-algebra is called a signed measure. It is bounded if |o(E)| < C 
for all E in the algebra. In this case Theorem 9.14 shows that it has a Jordan 
decomposition o = p+ — p~, where p* and p~ are uniquely determined finite 
measures such that any decomposition p = vt — v~ as the difference of finite 
measures has pt < vt and p~ < v~. We say that a bounded signed measure p 
on the Borel sets of the compact Hausdorff space X is a regular Borel signed 
measure if its Jordan decomposition is into regular Borel measures. If p = 
v' — v~ is any decomposition of a bounded signed measure p on the Borel sets 
as the difference of regular Borel measures, then the equalities pt < vt and 
p< v that compare the decomposition with the Jordan decomposition force 
p' and p~ to be regular, in view of Proposition 11.20. Hence p is a regular Borel 
signed measure. 

The regular Borel signed measures form a real vector space M(X,R). To 
see closure under vector space operations, we observe from the definition of 
regularity that the sum of two (nonnegative) regular Borel measures is a regular 
Borel measure. From this fact we can see that the sum of two regular Borel signed 
measures is regular and hence that M(X, R) is closed under addition: in fact, if 
p=pt—p- ando =ot —o~ are given in their Jordan decompositions, then 
the formula (90 +0)* —(p+0)~ = (pt +at)—(p7 +07) shows that p +o is 
the difference of two regular Borel measures and hence is regular. Thus M(X, R) 
is a real vector space. 


Proposition 11.25. The real vector space M(X, R) becomes a real normed 
linear space under the definition ||o|| = p+ (X) + p~ (X), where p = pt — p7 is 
the Jordan decomposition of p. 


PROOF. Certainly ||o|| => O with equality if and only if o = 0. Also, if p 
has the Jordan decomposition o = pt — p~, then —p = p~ — p* is the Jordan 
decomposition of —p, and it follows that ||ce|| = |c|||o|| for any real scalar c. 

Finally consider ||o + o||. If 0 = pt — po” ando = ot —o~ are Jordan 
decompositions, then the formula (0 +0) —(o+0)~ = (ot +07T)—(p7- +07) 
shows that (9 +0)* < pt +o7 and hence (9 +.0)*(X) < pt(X) +07 (X). 
Similarly (0 +0) (X) < p-X)+o7 (X). Adding these inequalities, we obtain 
lo +oll < llell + loll. 


Returning to the statement of Proposition 11.24, let us write C(X, R) or 
C(X, C) for the space of continuous scalar-valued functions when the field of 
scalars is important, reserving the expression C(X) for situations in which the 
scalars do not matter. Suppose that is a continuous linear function on C (X) that 
takes real values on real-valued functions. The proposition shows that @ is the 
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difference of two positive linear functionals. By Theorem 11.1, @ operates as the 


difference of two integrations: €(f) = 


fy fdvt — fy f dv~, where vt and v~ 


are the regular Borel measures corresponding to + and €~. Then £ corresponds 
to a regular Borel signed measure p and is given by integration: €(f) = y f dp, 
the integral with respect to the signed measure being interpreted as the difference 
of two integrals with respect to measures. Conversely any regular Borel signed 


measure p yields a continuous linear 


ef) = Sy f dp. 


functional € on C(X) by the definition 


In particular the passage to integration gives us a real-linear mapping of 


M(X, R) onto the space C(X, R)* of 


continuous linear functionals on the real 


vector space C(X, R). Both of these spaces are normed linear spaces, and the 


theorem is that the map is one-one and 


Theorem 11.26. The real-linear m: 


that the norms match. 


ap of M(X, R) onto C(X, R)* given by 


pr ewith €(f) = ty f dp is one-one and norm preserving. 


REMARK. As in Section V.9 the norm ||£|| of @ is the least constant C such that 
le(f)| < Cllf ee for all f. The constant C equals the supremum of |£(f)| over 


sup = I. 


all f with |j || 


PROOF. To see that the map is one-o 
C(X,R). Then f, f dot = fy fdo, 


ne, suppose that f, f dp = 0 for all f in 
and the uniqueness part of Theorem 11.1 


shows that pt = p~. Hence p = pt — p- = 0. 
Now suppose that £ and p correspond. Then we have 


eA =| Sy f de" 


ec eo 


< p*(XIIf 


Taking the supremum over all f with || 


ell < p(X) 4 


t fy f do | 
+ fy Lf ldp~ 
lsup + 07 COM F llsup: 
fl 


sup < 1, we obtain 


t p (X) = lel. 


For the inequality in the reverse direction, lete > 0 be given, and let X = PUN 
be a Hahn decomposition (Theorem 9.15) for o. By regularity of ot on P 
and p~ on N, choose compact subsets Kp and Ky with Kp C P, Kn CN, 
pt (P — Kp) < «,and p-(N — Kn) < «. Since p*(N) = O and p-(P) = 0, 


p'(X — Kp) <e 


and 


p (X — Kn) <€. (*) 


By Urysohn’s Lemma (Corollary 10.43), we can find a continuous function 


f :X — [-1, 1] such that f is 1 on K 
Jef) — llelll < Sg, fe — Weill 4 
< |p* (Kp) — p*(X)| + 


p and is —1 on Ky. Then 
| fy F 40 — Nevill + | Seeors Ff dp! 
|e (Kw) — p(X) + | Secnee £ del: 
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By (x) the first two terms on the right side are each < €. Since pt (KG NK) = 
pt(P—Kp) <eandp (KEK) =p (N—Kw) < €,and since IF lleag <i, 
the third term on the right side is < 2€. Therefore |e( fo- Iloll| < 4e, and our 
function f has the property that |€(f)| > (lell — Ae) lleap: In other words, 


l2|| = llell — 4e. Since € is arbitrary, ||¢|| > ||o||. This completes the proof. 


Now let us consider the case in which the values are complex. A regular 
Borel complex measure on the compact Hausdorff space X is an expression 
P = Pr+ip; in which pe and p; are regular Borel signed measures. In other 
words, it is a complex-valued set function whose real and imaginary parts are 
regular Borel signed measures. The space M(X, C) of these is a complex vector 
space, and we shall make it into a normed linear space shortly. Meanwhile, 
the space C(X, C)* of continuous linear functionals on C (X, C) is a complex 
normed linear space. Extending the definition of x f dp to handle members of 
M(X, C), we see from Proposition 11.24 that the complex-linear map of M(X, C) 
into C (X, C)* given by p+ ¢ with €(f) = Te f do is one-one and onto. 

To have a theorem in this case that parallels Theorem 11.26, we need to define 
the norm on M(X, C). Doing so on an element p is not just a matter of combining 
the norms of the real and imaginary parts of any more than writing the norm of 
a complex-valued L! function can be done in terms of the L! norms of the real 
and imaginary parts. A more subtle definition is needed. 

We define the total variation |o| of a member o of M(X, C) to be the non- 
negative set function whose value on a Borel set E is the supremum of all finite 
sums );_,; |o(E;)| with E = Uj_, E; disjointly. The total-variation norm of 
the member p of M(X, C) is defined to be ||p|| = |e|(X). It is a simple matter 
to verify that the total-variation norm is indeed a norm. 


Proposition 11.27. The total variation |o| of a member p of M(X, C) isa 
regular Borel measure, there exists a Borel function A with ||h||,,, < 1 such 
that o = hd|p|, and the total-variation norm on M(X, C) makes M(X, C) into 
a normed linear space in such a way that leg f dp| < ell Fl sup for every 
bounded Borel function f. Moreover, |o| equals ot + p7~ if p is real valued and 


has p = p* — p~ as its Jordan decomposition. 


REMARK. It follows that if o is real valued and if X = PUN is a Hahn 
decomposition (Theorem 9.15) for p, then the corresponding function h may be 
taken to be +1 on P and —N onN. 


PROOF. To see that |p| is additive, let E and F be disjoint Borel sets. If 
E = UE; disjointly and F = Uj_, Fj disjointly, then EU F = 


(Uji, Ei) U (Usa: Fi) disjointly, and hence $77", |o(Ei)| + 7_, lo(F)| < 
|p|(E U F). Taking the supremum over systems {£;} and then over systems 
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{Fj}, we obtain |p|(E) + |p|(F) < |p|(E U F). In the reverse direction let 
EUF = U?_, Ge disjointly. Then E = Ut_,(E 9 Gx) disjointly, and 
F =U, (FN Gx) disjointly. Hence 


Pp 
=) Ip(E N Gx) + p(F NG,)| < 3 |O(E 1 Gx)| + 3 OCF Gx)I, 
k=1 


and this is < |p|(E) + |p|(F). Taking the supremum over systems {G;}, we 
obtain |p|(E U F) < |p|(Z) + |p|(F). Thus |p| is additive. 

To prove that |p| is completely additive, let E = J, Ep disjointly. For every 
NON lol(En) = |ol(E1 UU Ey) < |p|(E), and hence 7, |p|(En) < 
|o\(E). For the reverse inequality let {G;}7_, be a finite collection of disjoint 
Borel sets with union E. Then E,, = Ora (E, O Gx) disjointly, and hence 


x lP(Gx)| = > P(E Gx) = 3 Dd P(En 1 Gx) 
= k=1¢ n=1 
Pw co p (eve) 
=) » le(En ON Gx)| = Xu X |o(En Gx)| < Do lel(En). 
k=1n=1 n=1k=1 n=1 


Thus |p|(E) < ey |e|(E,,), and || is completely additive. 

The measure || is certainly finite on X and hence on all compact sets. To see 
regularity, we write 9p = pr +ip; = On — Prt ip; —ip, . Writing a set EF as 
the disjoint union of n sets E; and writing out p(£;) according to this expansion 
of o, we see that |p|(E) < (op + Ppt P; + p; )(E). Each measure on the right 
side is regular, and Proposition 11.20 therefore shows that || is regular. 

For the existence of h, let us write p in terms of its real and imaginary parts 
as p = prt+ip,;. If E is a Borel set, then the definitions give |p|(E) > 
|p(E)| => |er(E)| and similarly |p|(E) > |o;(E)|. Hence pr « |p| and 
pr < |p|. By the Radon-Nikodym Theorem (Corollary 9.17), there exist 
functions hr and h; integrable [d|p|] such that pr = hrd|p| and py = h;d|p|. 
Thus the |p| integrable complex-valued function h = hr + ih; has p = hd|p|. 
We shall show that / has |h(x)| < 1a.e.[d||]. Ifthe contrary were the case, then 
there would exist a constant c with |c| = 1 and ane > Osuchthat Re(ch) > 1+e 
on a set E of positive |o| measure and we would have 


| fet dlol| = | fp ch dlpl| = Re fy chdlol = J, Re(ch) dlo| 
> (1+ [pl(E) > +0 = +0 fy 


a contradiction. Thus h exists as asserted. 
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The inequality | BE i dp| < |lellils llsap follows from the existence of h since 
ly f4| = | Sy FA ell < WF Alloup Sy All < WF WeuplelX) = UF lleupllell- 

Finally if p is real valued, then any Borel set E satisfies |o(E)| = 
lot (E) — p-(E)| < pt (E) + p-(E). If E is the disjoint union of Borel sets 
E\,..., En, we consequently have 


Y PEN E) < NtEN Ej) + p-(E NE) = pt (E) + p(B). 
j=l j=l 


Taking the supremum over all decompositions of EF of this kind gives |p|(E) < 
p*(E)+ 7 (E). For the reverse inequality let X = PUN bea Hahn decomposi- 
tion (Theorem 9.15) for p, so that pt (E) = p(P NE) and p-(E) = —p(NNE). 
Then E is the disjoint union of EM P and EN N, and thus p+(E) + p-(E) = 
|lp(EN P)|+|p(ENN)| < |e|(Z). In other words, |o| = pt + p~ as asserted. 


Theorem 11.28. The one-one complex-linear map of M(X, C) onto C(X, C)* 
given by p +> ¢ with €(f) = f, f do is norm preserving. 


PROOF. If f is in C(X), then Proposition 11.27 gives |€(f)| = re f dp| < 
loll II F llsup- Taking the supremum over all f with || f||,,, < 1, we obtain ||¢|| < 


sup — 

loll. 
For the reverse inequality let € > O be given, and choose a finite disjoint 
collection of Borel sets E,,..., E, with union X such that yy |o(E;)| = 


|e|| — €. Since |p| is regular, we can find compact sets K; C E; such that 
p|(E; — K;)| < €/n for each i. 

We shall define disjoint open sets U; with K; C U; for all i. First we find 
disjoint open sets U; and V, containing K; and K,U---UK,,. Having inductively 
chosen disjoint open sets U;,..., U; and V; such that K; C U; fori < j and 
Kj4, U---UK, C V;, we use Corollary 10.22 to choose disjoint open subsets 
Uj4, and V;+, of V; containing K;,; and Kj. U---U Ky. In this way we obtain 
the disjoint open sets U;,..., U, with K; C U; for alli. 

For 1 <i <2, choose f; € C(X) with values in [0, 1] such that f; is 1 on 
K; and is 0 off U;. Choose c; € C for each i such that c;0(E;) = |o(£;)|, and 
define fo = )~_, ci f;. The function fo has || folly) = 1 since the sets U; are 
disjoint. Then 


sup 


€(fo) = fi fodp = ¥ Jy, fode = (Jp, 01d0 + Jy, (fo ~€0) 4p) 


= VIED + fe,n, (Fo — 60 dp. 
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Hence 
le(fo) - xy lo(Eil| < > Se,-x, fo - cil lol 
< 2¥° Wale ~K)\< 2) ¢/n = 2 
and 
|e(fo) — alll < [eCfo) - = lo(Ei)|| + > lo(Ei)| = llalll < 3e. 
Therefore 


[ell = NZI follsup = LC fo)! = Mell — [eC fo) — Hell] = Well — 3e. 


Since € is arbitrary, ||€|| > ||o|l. 
5. Problems 


In all problems for this chapter, X is assumed to be a locally compact Hausdorff 
space. Sometimes additional hypotheses are imposed on X. 


1. (a) Prove that if X is o-compact, then the o-algebra of Borel subsets of X 
coincides with the o-algebra of intersections of X with the Borel subsets of 
the one-point compactification X*. 
(b) Prove that if X is an uncountable discrete space, then the o-algebra of Borel 
subsets of X is strictly smaller than the o-algebra of intersections of X with 
the Borel subsets of the one-point compactification X™*. 


2. Prove that if X is o-compact and f : X — C is continuous, then f is a Borel 
function. 


3. Suppose that X is o-compact. Prove that if jz is a regular Borel measure on X 
and if f is Borel measurable, then there exists a Baire measurable function g 
such that f = g except on a Borel set of 4 measure 0. 


4. (Lusin’s Theorem) Let X be compact, let jz be a regular Borel measure on X ,, let 
f bea Borel function on X, and let € > 0 be given. By first considering simple 
functions and then passing to the limit via Egoroff’s Theorem, prove that there 
exists a compact subset K of X with w(K‘) < € such that f | x 18 continuous. 


5. This problem establishes the rotation invariance of the Borel measure dw on the 
sphere S* C R? obtained from Riemann integration with respectto sin 6; d6, d02, 
where 6; and 62 are latitude and longitude with 0 < 6; < m andO < 62 < 27. 
The measure dw was constructed by means of the Riesz Representation Theorem 
as one of the examples in Section 2. 
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(a) A rotation in R? is the linear function L determined by a matrix A with 
AA" = 1 and det A = 1. For0 <a <1 <b < ow, let S,,, be the subset of 
R? given in spherical coordinates bya <r <b,0 <0, < 7,0 < 62 < 27. 
Show that Sz» is carried to itself by any such rotation L. 

For any bounded Borel function F : S,, — C, let (LF)(x) = F(L7!x) if 

x is in Syp and L is a rotation. Prove that Pe LF dx = es F dx. 

(c) Let f : S? + C be any continuous function, and define (Lf)(@) = 
f (L-!w). Extend f to a function F defined on S,, by the definition 
F(rw) = f(q@). Prove that Se Fdx = cf? r? dr)( Jc2 f(@) da) and 
deduce that fn Lf dw = fo f do. 

(d) Deduce from (c) that dw(L(E)) = dw(E) for every Borel subset EF of S?. 


(b 


ma 


6. Let X be compact. 

(a) Let {K,} be a collection of compact subsets of X closed under finite inter- 
sections, and let K = (), Kq. Prove that every regular Borel measure 2 on 
X has the property that w(K) = inf, (Ka). 

If jz is a nonzero regular Borel measure on X assuming only the values 0 
and 1, prove that jz is a point mass. 
(c) If wis a nonzero regular Borel measure on X with 


[ tean=([ tan)([ ean) 


for all f and g in C(X), prove that yz is a point mass. 

If 2 is a positive linear functional on C (X) that is multiplicative in the sense 
that (fg) = €(f)€(g) for all f and g in C(X), prove that € is zero or ¢ is 
evaluation at some point of X. 


(b 


ma 


(d 


wm 


7. This problem continues the investigation of harmonic functions and Poisson 
integrals in the unit disk of R*, following up on Problems 7-8 at the end of 
Chapter IX. Problem 8 in that series provides orientation. The new ingredient 
for the present problem is weak-star convergence of sequences in M(S!, C) 
against C(S!), where S! is the unit circle. 

(a) State and prove a characterization of the harmonic functions u(r, 0) on the 
open unit disk such that supp<,—; ||u(", -) ||, is finite. 

(b) (Herglotz’s Theorem) Prove that if u(r, 0) is a nonnegative harmonic 
function on the open unit disk, then there is a Borel measure jz on the 
circle such that u(r, 0) = [on P.(0 — g) du(g). 


Problems 8-10 construct a Borel measure jz on a compact space such that ju is not 
regular. The totally ordered set Q of countable ordinals was introduced in Problems 
25-33 at the end of Chapter V. Let Q* = Q U {oo}, totally ordered so that every 
element of is less than {oo}. Give Q2* the order topology, as discussed in Problems 
25-32 at the end of Chapter X. 
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Prove that Q* is compact Hausdorff. 


9. Prove that the class of all relatively closed uncountable subsets of Q is closed 
under the formation of countable intersections. 
10. Define pz on the Borel sets of Q* to be 1 on those sets E' such that E — {oo} contains 


a relatively closed uncountable subset of Q, and put v(Z) = 0 otherwise. Prove 
that jz is a Borel measure that is not regular. 


Problems 11—14 concern decomposing any finite Borel measure on a compact X into 

a regular Borel measure and a “purely irregular’ Borel measure. They make use of 

Zorn’s Lemma (Section A9 of the appendix). A Borel measure jz will be called purely 

irregular if there is no nonzero regular Borel measure v such that0 < v(E) < w(E) 

for every Borel set FE. 

11. Use Zorn’s Lemma to show that any Borel measure on X is the sum of a regular 
Borel measure and a purely irregular Borel measure. 


12. Prove thatif v isaregular Borel measure, if jz is purely irregular, andif0 < wu < v, 
then uw = 0. 

13. Deduce from the Jordan decomposition (Theorem 9.14) that the decomposition 
of Problem 11 is unique. 


14. Prove that the irregular Borel measure constructed in Problem 10 is purely 
irregular. 

Problems 15-19 concern extension of measures from finite products of compact 

metric spaces to countably infinite such products. Let X be a compact metric space, 

and for each integer n > 1, let X, be a copy of X. Define Q™) = X eM 

and lett Q= X ae Each of QM? and Q is given the product topology. If E 

is a Borel subset of Q%), we can regard E as a subset of Q by identifying E with 

Ex(X ie oe: In this way any Borel measure on Q™? can be regarded as a 

measure on a certain o-subalgebra F,, of B(Q). 

15. Prove that J”, F, = F is an algebra. 

16. Let v, be a (regular) Borel measure on Q with v(Q”) = 1, and regard v, 
as defined on F,,. Suppose for each n that v, agrees with v,4; on F,. Define 
v(E) for E in F to be the common value of v,(£) for n large. Prove that v is 
nonnegative additive, and prove that in a suitable sense v is regular on F. 

17. Using the kind of regularity established in the previous problem, prove that v is 
completely additive on F. 

18. In view of Problems 16 and 17, v extends to a measure on the smallest o-algebra 
for Q containing F. Prove that this o-algebra is B(2). 

19. Let X be a 2-point space, and let v, be 2~” on each one-point subset of 2”. 


Exhibit a homeomorphism of Q onto the standard Cantor set in [0, 1] that carries 
v to the Cantor measure defined in Problems 17—20 at the end of Chapter VI. 


CHAPTER XII 


Hilbert and Banach Spaces 


Abstract. This chapter develops the beginnings of abstract functional analysis, a subject designed 
to study properties of functions by treating the functions as the members of a space and formulating 
the properties as properties of the space. 

Section | defines Banach spaces as complete normed linear spaces and gives a number of examples 
of these. The space of bounded linear operators from one normed linear space to another is a normed 
linear space, and it is a Banach space if the range is a Banach space. 

Sections 2-3 concern Hilbert spaces. These are Banach spaces whose norms are induced by 
inner products. Section 2 shows that closed vector subspaces of such a space have orthogonal 
complements, and it shows the role of orthonormal bases for such a space. Section 3 concentrates 
on bounded linear operators from a Hilbert space to itself and constructs the adjoint of each such 
operator. 

Sections 4—6 prove the three main abstract theorems about the norm topology of general normed 
linear spaces— the Hahn—Banach Theorem, the Uniform Boundedness or Banach-Steinhaus Theo- 
rem, and the Interior Mapping Principle. A number of consequences of these theorems are given. 
The second and third of the theorems require some hypothesis of completeness. 


1. Definitions and Examples 


Functional analysis puts into practice an idea from the early twentieth century, 
that sometimes properties of functions become clearer when the functions are 
regarded as the members of a space and the properties are formulated as properties 
of the space. We encountered some simple examples of this situation already in 
Chapter II in the examples of metric spaces. Uniform convergence was encoded in 
the metric on spaces of functions, and other kinds of convergence were captured by 
other metrics. In Chapter V we introduced the spaces L!(X), L?(X), and L®(X) 
of functions (or really equivalence classes of functions), all of which were proved 
to be complete. The property of completeness was a useful property of the space 
as a whole that led, for one thing, to the Riesz—Fischer Theorem in Chapter VI. 
More complicated properties led us to various kinds of differentiability of integrals 
in R” in Chapters VI and IX and to boundedness of the Hilbert transform in 
Chapter IX. The development of measure theory on locally compact Hausdorff 
spaces in Chapter XI rested on an analysis of positive linear functionals on the 
space of continuous functions of compact support. 
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The different spaces—of functions, measures, and whatever else—that arise 
in this way have some properties in common, and we study them in this chapter in 
a setting that emphasizes these common properties. We shall work with normed 
linear spaces, which were defined in Section V.9. With such spaces the field of 
scalars F can be either R or C. Recall then that a normed linear space X is a 
vector space over F with a norm, i.e., a function || - || from X to [0, +00) such 
that ||x || => O with equality if and only if x = 0, ||cx|| = |c|||x|| if is a scalar, and 
lx + yll < |x] + lly||. The norm yields a metric d(x, y) = ||x — y||, and we can 
then speak of the norm topology on X. Proposition 5.55 showed that addition and 
scalar multiplication are continuous, that the closure of any vector subspace of X 
is a vector subspace, and that the set of all finite linear combinations of members 
of a subset S of X is dense in the smallest closed subspace containing S. 

Completeness plays an increasingly important role as one studies such spaces, 
and it is customary to introduce a definition to incorporate this notion: a normed 
linear space X is a Banach space if X is complete as a metric space. The metric- 
space completion of a normed linear space is automatically a normed linear space 
that is complete, hence is a Banach space. 

Let us consider some examples of normed linear spaces, some old and some 
new. Except as indicated, they will all be Banach spaces. 


EXAMPLES. 

(1) Euclidean space IR” and complex Euclidean space C”, written briefly as 
F”. The space consists of n-tuples of scalars a = (a1,...,d,) with |la|| equal 
to the Euclidean norm |a| of Section II.1, namely ||a|| = ‘oper laxl) It was 


remarked in Section II.7 that these spaces are complete, hence are Banach spaces. 


(2) Finite-dimensional normed linear spaces. It can be shown that each finite- 
dimensional normed linear space X is complete. In fact, any linear map carrying 
a vector-space basis of X to a vector-space basis of some F”, normed as in the 
previous example, can be shown to be uniformly continuous with a uniformly 
continuous inverse, and the completeness of X follows. 


(3) B(S), the space of bounded scalar-valued functions on a nonempty set S 
with the supremum norm, defined in Section II.1. Proposition 2.44 establishes 
the completeness. 


(4) C(S), the space of bounded continuous scalar-valued functions on a metric 
space or topological space S, defined in Section II.4 in the metric case and Section 
X.5 in general. The norm is the supremum norm. Corollary 2.45 and Proposition 
10.30 establish the completeness of C(S). When S is locally compact Hausdorff, 
we defined Co(S) in Section XI.4 to be the subspace of C(S) of all members 
vanishing at infinity. This is complete. However, the subspace Coom(S) of 
continuous scalar-valued functions of compact support is usually not complete. 
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(5) L?(S,A, 4), the space of equivalence classes of p'-power integrable 
functions on a measure space (S,.A, 2). This is a normed linear space for 
1 < p < © with norm ||f||, = (Is f(s)? dus)”. These spaces were 
introduced in Section V.9 for p = 1 and p = 2 and in Section IX.1 for general p. 
Theorem 5.59 established the completeness for p = 1 and p = 2, and Theorem 
9.6 established the completeness for general p. 


(6) L*°(S, A, w), the space of equivalence classes of essentially bounded 
functions on a measure space (S,.A, 2). This is a normed linear space with 
norm the essential supremum norm. This space was introduced in Section V.9 
and was proved to be complete in Theorem 5.59. 


(7) Sequence spaces c, co, and e? and €? for 1 < D < ow. These are 
special cases of various examples above. The space ¢? is L?(S,.A, w) when 
S = {1,2,...,n}, Ais the set of all subsets, and yz is counting measure, the norm 
being ||(a1,.-.,4n)l] = (Uh, lael?)'” if p < 00 and being ||(a1, ...,an)|| = 
maxj<r<n |ax| if p = co. The space ep specializes to F” when p = 2. The 
space €? is the version of £? when S is the set of positive integers; the members 
of this space are thus all sequences for which the norm is finite. The sequence 
spaces c and co can be regarded as subspaces of C (S$) when S is the set of positive 
integers. The space c consists of all convergent sequences, and co is the space of 
sequences vanishing at infinity; in both cases the norm is the supremum norm. 
All these examples are Banach spaces. They tend to be useful in testing guesses 
about properties of normed linear spaces. We shall not need them explicitly, and 
this traditional notation for them will not recur after the end of this section. 


(8) M(S), S being a compact Hausdorff space. This is the space of regular 
Borel signed or complex measures on S, introduced as M(S, R) or M(S, C) in 
Section XI.4. The norm is the total-variation norm. Theorems 11.26 and 11.28 
identify these spaces with duals of spaces of continuous functions, and Proposition 
12.1 below will show that they are complete as a consequence. 


(9) CN (fa, b]), the space of scalar-valued functions on a bounded interval 
[a, b] with N bounded derivatives, the norm being 


N 
I fll = >> sup |f%O). 


j=l a<s<b 


It is shown in Problem 2 at the end of the chapter that this space is complete. This 
space is an indication of how normed linear spaces can carry information about 
derivatives. Indeed, normed linear spaces carrying information about derivatives 
play a significant role in the subject of partial differential equations.! 


'This is one of the themes of the companion volume Advanced Real Analysis. 
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(10) H®*(D), the space of bounded functions in the open unit disk D = 
{|z| < 1} in C such that the function is given by a convergent power series. The 
norm is the supremum norm. It is shown in Problem 3 at the end of the chapter 
that this space is complete. 


(11) A(D), the space of bounded continuous functions on the closed unit disk 
whose restriction to the open unit disk is given by a convergent power series. The 
norm is the supremum norm. It is shown in Problem 3 at the end of the chapter 
that this space is complete. 


Two further kinds of normed linear spaces are worth mentioning now. One is 
that any real or complex inner-product space X in the sense of Section II.1 gives 
an example of a normed linear space. Recall that an inner product on X is a 
function (-, -) from X x X to F that is linear in the first variable, is conjugate 
linear in the second variable, is symmetric if F = R or Hermitian symmetric if 
F = C, and has (x, x) > 0 for all x with equality if and only if x = 0. Such 
an inner product satisfies the Schwarz inequality |(x, y)| < (x, x)!/*(y, y)!”, 
according to Lemma 2.2, and then the definition ||x|| = (x, x)!/? makes X into a 
normed linear space, according to Proposition 2.3. 

As a normed linear space, an inner-product space may or may not be complete. 
Any space L?(S, A, »), with (f,g) = he fgdw, is an example in which the 
associated normed linear space is complete. An inner-product space whose 
associated normed linear space is complete is called a Hilbert space. 

The other kind of normed linear space worth mentioning now involves bounded 
linear operators. Recall from Section V.9 that a linear function L : X — Y 
between two normed linear spaces with respective norms || - ||, and || - |ly is 
often called a linear operator. Proposition 5.57 showed that a linear operator 
L is continuous at a point if and only if it is continuous everywhere, if and 
only if it is uniformly continuous, if and only if it is bounded in the sense that 
L(x) lly < M||x|ly for some constant M and all x in X. The least such constant 
M is called the operator norm of L, written || ||. We can define addition and 
scalar multiplication on bounded linear operators from X to Y by addition and 
scalar multiplication of their values: 


(Li + L2)() = Lia) + Lo(x) and = (cL) (x) = cL (x). 


Then L; + L2 and cL are linear operators by the elementary theory of vector 
spaces, and the inequalities 


(L1 + La)@)Ily = LiQ@) + Lo@)lly S Lr @Olly + IL2@)ly 
S Lillia lly + Lally = (Lill + [ZelD ley 
and IKcLy@) lly = lleL@ly = lelML@O lly S lelZ Maly 
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show that L; + L> and cL are bounded with ||L; + Lo|| < ||Zy4|| + ||Zel] and 
cL || < |e|||L|]. Applying the latter conclusion to c~! when c ¥ 0 gives ||L|| = 
le (CL)I < lel "eZ ll < lel" lelIL|| = LI), and we conclude that ||cL || = 
Ic|||Z||. Since it is plain that ||Z|| => 0 with equality if and only if L = 0, the set 
of bounded linear operators from X to Y, with the operator norm, is a normed 
linear space. We denote this normed linear space by B(X, Y). 


Proposition 12.1. If X and Y are normed linear spaces and if Y is complete, 
then the normed linear space B(X, Y) is a Banach space. 


REMARKS. In the special case in which Y is the set F of scalars, the linear 
operators are called linear functionals, in terminology we have used repeatedly. 
The normed linear space F = F! is complete, and therefore the normed linear 
space of bounded linear functionals on X is a Banach space. The space of 
bounded linear functionals is called the dual space of X and is denoted by X*. 
More explicitly the norm of an element x* of X* is” 


I|x"l] = sup [x*(x)]. 
es 


Proposition 12.1 is implicitly saying that X* is always complete. 


PROOF. Let {L,,} be a Cauchy sequence in B(X, Y). Since in any metric space 
the members of a Cauchy sequence are at a bounded distance from any particular 
element, the sequence {||L,||} is bounded. Let C = sup, ||Ln|l- 

If x isin X, then {L,(x)} is a Cauchy sequence since ||Lin(x) — Li(x)\ly < 
|Ln —Lallllx lly. By completeness of Y, L(x) = lim, L, (x) exists. Continuity of 
addition and scalar multiplication in X implies that L(x+x') = lim, Ly (x+x') = 
lim, (Ly(x) + Ly(x’)) = lim, Ly(x) + lim, Ly(x.) = L(x) + L(x’) and that 
L(cx) = lim, Ly,(cx) = lim, (cL, (x)) = c lim, L,(x) = cL(x). Therefore L is 
a linear operator. 

For boundedness of L, we have ||[Ln(x)\ly < |Znllll lly < Cllxlly for all n. 
Hence continuity of the norm function implies that ||L(x)||, = || im L,(x)|ly < 
lim inf, |[Ln(x)|ly < Cllxlly, and L is bounded with ||L|| < C. 

To complete the proof, we show that ||Z, — L|| — 0. Assuming the contrary, 
we can pass to a subsequence and then change notation so that ||L, — L|| > € 
for some € > 0 for all n. Then for each n, we can find x, in X with ||xp||y = 1 


2A superscript * has also been used in this book to indicate a one-point compactification, but 
there need never be any confusion about this notation. One-point compactifications arise in practice 
only for locally compact Hausdorff spaces, and one can show that a normed linear space is locally 
compact only if it is finite dimensional, For finite-dimensional normed linear spaces it is always 
clear from the context whether * refers to the dual space or to the one-point compactification. 
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such that ||Ln(%_) — L(xn)|ly = €/2. Choose and fix N so that m > N implies 
|Ln — Li|| < €/4. Whenever m > N, the triangle inequality gives 


[Lm@n) — Lally 2 Ln @y) — L@w lly — En Gn) — Lm Qn )lly 


> §—||Ly — Lnllilewlly = § — Ln — Lmll = §, 


in contradiction to the fact that lim,, Lm (xv) = L(xy). 


EXAMPLES OF DUAL SPACES. 

(1) L?(S, A, w)* X L?'(S, A, ») if 1 < Dp < &, wis o-finite, and p’ is the 
dual index with 1 + + = 1, according to the Riesz Representation Theorem 
(Theorem 9.19). Specifically to each x* in L?(S,.A, 1)* corresponds a unique g 
in L?’ (S, A, ) with x*(f) = iz fgdu for all f in L?(S, A, 2), and this g has 
|x*|| = Igll- It can be shown that the hypothesis of o-finiteness of can be 
dropped if 1 < p < oo, but Problem 4 at the end of Chapter IX shows that the 
hypothesis cannot be completely dropped for p = 1. 


(2) (@2)* = e? and (€?)*  €”' for 1 < p < cif p’ is the dual index. This 
is a special case of Example 1. In particular, the first of these duality results for 
p = 2 says that (R”)* = R” and (C”)* = C’. 

(3) C(S)* = M(S) if S is a compact Hausdorff space, according to Theorems 
11.26 and 11.28. Specifically to each x* in C(S)* corresponds a unique p in 
M(S) with x*(f) = fe f dp for all f in C(S), and this ¢ has ||x*|| = |||]. Since 
M(S) is in this way identified as the dual space of some normed linear space, it 
follows from Proposition 12.1 that M(S) is a Banach space. 

(4) (eR°)* = eh and (co)* = ¢!. The isomorphism (e°)* = ie is the special 
case of Example 3 in which S = {1,...,}. To see the isomorphism (co)* = }, 
we take S to be the set of positive integers and form the one-point compactification 
S*. The continuous scalar-valued functions on S*, with their supremum norm, 
can be identified with the normed linear space c of convergent sequences. Thus 
Example 3 in this setting says that c* = M(S*). The members of co are the 
members of c that vanish at oo, and any point mass at oo in a member of M(S*) 
has no effect on the subspace cg. It readily follows that the dual of co consists of 
the members of M@(S*) with no point mass at oo, and these elements, with their 
norm, may be identified with ¢!. 


From one point of view, Hilbert spaces are particularly simple Banach spaces, 
and we shall study them first. The geometry of Hilbert space will be the topic of 
the next section, and the section after that will give a brief introduction to bounded 
linear operators from a Hilbert space to itself. 
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2. Geometry of Hilbert Space 


Hilbert spaces were defined in Section 1 as complete normed linear spaces whose 
norms arise from an inner product. Euclidean space R” and complex Euclidean 
space C” are examples, and every space L?(S, A, 2) with (f, g) = Ss f&du 
is a Hilbert space. We shall see in this section that every Hilbert space shares 
many geometric facts in common with the finite-dimensional examples R” and 
C”. The expansion of square integrable functions on [—z, x] in Fourier series 
will be seen to be an example of expansion of all members of a Hilbert space in 
terms of an “orthonormal basis.” 

Let H be a real or complex Hilbert space with inner product (-, -) and with 
norm || - || given by ||u|| = (u,u)!/?. Lemma 2.2 shows that H satisfies the 
Schwarz inequality 


|\(u, v)| < |lu|l lvl] for all u and v in A. 


The Schwarz inequality implies the estimate 


|(u, v)— (Wo, vo)| S |(U—uo, v)|+|(o, V—v0)| < [lu — voll llvll + lluollllv— voll, 
from which it follows that the inner product is a continuous function of two 
variables. 
We shall make frequent use of the formula 
22. 2 2 
Iu + vl” = llulle + 2Re, v) + |lull’, 
which is what one combines with the Schwarz inequality to prove the triangle 


inequality for the norm. With the additional hypothesis that (u,v) = 0, this 
formula reduces to the Pythagorean Theorem 


Z 2 2 
Iu + vl” = [lal + loll’. 


Direct expansion of the norms squared in terms of the inner product shows that 
H satisfies the parallelogram law 


Ju + vl]? + lle — vi]? = 2|uell? + 2K v4] for all u and v in H. 


Actually, there is a converse to this formula, due to Jordan and von Neumann, 
whose details are left to Problems 19-24 at the end of the chapter: a Banach space 
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is a Hilbert space if its norm satisfies the parallelogram law. The idea is that the 
inner product in a Hilbert space can be computed from the identity 


1 
(u,v) = que liu + iF vl, 


where the sum extends for k € {0,2} if the scalars are real and extends for 
k e€ {0, 1,2, 3} if the scalars are complex. This identity goes under the name 
polarization. For the result of Jordan and von Neumann, one defines (u, v) by this 
formula, shows that the result is an inner product, and proves that ||u \? = (u,v). 

The following lemma, which makes use of the completeness, is the key to all 
the geometry. 


Lemma 12.2. If M is a closed vector subspace of the Hilbert space H and if 
u is in H, then there is a vector v in M with 


lu — v|| = inf |lu — wl]. 
weM 


REMARK. Examination of the proof will show that we do not make full use 
of the assumption that M is closed under addition and scalar multiplication, only 
that M is closed under passage to convex combinations, i.e., that x and y in M 
imply that tx + (1 — ¢)y is in M for all t with O < ¢ < 1. Thus it is enough to 
assume that M is a closed convex set, not necessarily a closed vector subspace. 


PROOF. Let d = infyey ||u — w||, and choose a sequence {w,} in M with 
lu — w,|| — d. By the parallelogram law, 


[2 — (Wa + Wm)? + llWa — Wall? = 2a — Wm ll? + lu — wall?) — 4d’. 
Since (Wn + wm) isin M, 
||2u — (Wa + Wm)II? = 4 lu — F(w_ + Wa) IP? > 4d”. 


We conclude that ||w, — wy»||>7 > 0, and {w,} is Cauchy. By completeness of 
H, {w,} is convergent. If v = lim w,, then v is in M since M is topologically 
closed. Since ||u — w»|| — d, continuity of the norm gives ||u — v|| = d. 


Two vectors u and v in H are said to be orthogonal if (u, v) = 0. The set of 
all vectors orthogonal to a subset M of H is denoted by M+. In symbols, 


M+ ={u € H | (u,v) =0 forall v € M}. 


We see by inspection that M+ is aclosed vector subspace. Moreover, MN M+ = 0 
since any u in MO M+ must have (u, uw) = 0. The subspace M+ will be of greatest 
interest when M is a closed vector subspace, as a consequence of the following 
proposition. 
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Proposition 12.3 (Projection Theorem). If M is a closed vector subspace of 
the Hilbert space H,, then every u in H decomposes uniquely as u = v + w with 
vin M and win M+. 


REMARKS. One writes H = M @ MX to express this unique decomposition 
of vector spaces. Because of this proposition, M+ is often called the orthogonal 
complement of the closed vector subspace M. It is essential that M be closed 
in this proposition. In fact, consider the vector subspace M of polynomials in 
L?({0, 1]). This is dense as a consequence of the Weierstrass Approximation 
Theorem, and consequently no L? function other than 0 can be in M+. Thus not 
every member of L? is the sum of a member of M and a member of M+. 


PROOF. Uniqueness follows from the fact that MM M+ = 0. For existence let 
u be in H, and choose v in M by Lemma 12.2 with ||u — v|| = infwem |lu — w]. 
If m is any member of M with ||m|| = 1, then the vector v + (u — v,m)m is in 
M and the formula ||x — y||? = |x|? — 2Re(x, y) + lly|l? gives 


2 2 
Iu — vl" < lu —v —U— v, m)m|| 
2 2 2 
= lu — vl" — 2|u—v, m)|" + | — v, m)| 


2 2 
= |lu— vl" —|@—v,m)/°. 


Hence (u — v,m) = 0. Since every nonzero member of M is a scalar multiple of 
a member with ||m|| = 1,u — visin Mt. 


Corollary 12.4. If M is a closed vector subspace of the Hilbert space H, then 
MoM, 


PROOF. From the definition we see that M C M++. Ifwisin Mt+, write u = 
m+m+ withm € M andm+ € M+ by Proposition 12.3. Then0 = m++(m—u) 
with m+ € M+ and m —u € M++. By the uniqueness in the decomposition 
H = M+@M1*+ of Proposition 12.3, m+ = 0 and m—u = 0. Therefore u = m 
isin M,and M++ = Mm. 


Theorem 12.5 (Riesz Representation Theorem). If ¢ is a continuous linear 
functional on the Hilbert space H, then there exists a unique v in H with €(u) = 
(u, v) for all vin H. This vector v has the property that || 2|| = ||v||. 


REMARKS. It is instructive to compare this result with the version of the 
Riesz Representation Theorem in Theorem 9.19, which applies to L?(S, A, 2) 
for 1 < p < oo and in particular to L?(S, A, w). That theorem associates to a 
continuous linear functional £ on this L* space a member g of the space such that 
ip =f s fg dy for all f in the space. The present theorem, applied with H = 
L?(S, A, 1), instead yields a member v of the space such that ¢(f) = fe fudu 
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for all f in the space. The connection, of course, is that the function g is v. The 
space L?(S, A, ju) has a canonically defined notion of complex conjugation, but 
an abstract Hilbert space does not. Because of the existence of this canonical 
conjugation, Theorem 9.19 gives us a canonical linear isometry of LE? (8; A,piy’ 
onto L?(S , A, u), whereas Theorem 12.5 gives us a canonical isometry that is 
merely conjugate linear. 


PROOF. Uniqueness is immediate since if (u, v) = 0 for all u, then (u, v) = 0 
for u = v, and hence v = 0. Let us prove existence. If € = 0, take v = 0. 
Otherwise let M = {u | €(u) = 0}. This is a vector subspace since £ is linear, 
and it is closed since £ is continuous. By Proposition 12.3 and the fact that M is 
not all of H, M+ contains a nonzero vector w. This vector w must have €(w) 4 0 
since MM M+ = 0, and we let v be the member of M+ given by 


€(w) 
v= 5 W. 
|| w || 
: £(u) = Lu) ss : 
For any u in H, we have e(u Tan w) = 0, and hence u — Zw) Wisin M. Since 


visin Mt, u— ne w is orthogonal to v. Thus 
eu) _ em) Tw) \ ew) Iwi? 
(0) = (Fray) = Cecay Fant) = Fay fing? = 2” 


This proves existence. 

For the norm equality every u in H has |€(u)| = |(u,v)| < |lul||lvl| by 
the Schwarz inequality. Taking the supremum over all uw with ||u|| < 1 gives 
|2|| < ||v|]. On the other hand, |(u, v)| = |€(u)| < |2||||u||; putting wu = v gives 
lull < ll€l]. Thus ||€]| = lull. 


A subset S of H is orthonormal if each vector in S has norm 1 and if each 
pair of distinct vectors in S is orthogonal. For example, relative to the inner 
product (f,g) = +f” fzdx, the functions x + e’”* are orthonormal as n 
varies through the integers. An orthonormal set S is linearly independent; in 
fact, if vj,..., U, are members of S with }°; c;v; = 0, then the computation 
O= (v;, ae civ;) SSO GA) = clly; ll? = c; shows that c; = 0 for all j. 

We encountered other examples of orthogonal sets, beyond the functions e’”*, 
in Chapter IV in connection with solving certain ordinary differential equations. 
Such an orthogonal set becomes orthonormal when each member is scaled by 
the reciprocal of its norm. One example was the system of Legendre polyno- 
mials P,(x), which were introduced in Section IV.8: the differential equation 
(1 —t*)y" — 2ty’ + n(n + 1)y = Ohas polynomial solutions y(t) that are unique 
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up to a scalar, and P,,(t) is asuitably normalized polynomial solution, necessarily 
of degree n. These can be shown to be orthogonal? in L*({—1, 1], dt). 
Another example was constructed from the Bessel function 


12” 


Jo(t) = 2 Saye 


which was defined in Section IV.8. There are infinitely many distinct positive real 
numbers k, such that Jo(k,) = 0, and it can be shown that the functions 
x t+ Jo(knx) are orthogonal* in L?((0, 1], x dx). 

If an ordered set of n linearly independent vectors in H is given, the Gram— 
Schmidt orthogonalization process, which appears in Problem 6 at the end of 
the present chapter, gives an algorithm for replacing the set with an orthonormal 
set having the same linear span. 

Let M be aclosed vector subspace of H,so that H = M @ M*+ by Proposition 
12.3. The linear projection operator E of H on M along M+, given by the identity 
on M and the 0 operator on M+, is called the orthogonal projection of H on M. 
The linear operator E is bounded with || £|| < 1 because if u € H decomposes 
as u = m-+mt, the Pythagorean Theorem gives 


JE@I? = EGn + m*)|? = Im? < [lm]? + [m= |P? = Nel? 
We are going to derive a formula for E in terms of orthonormal sets. 


Lemma 12.6. If {u;} is an orthonormal sequence in the Hilbert space H 
and if {cj} is a sequence of scalars, then pas cju; converges if and only if 
ae Ic;|? < 00, and in this case 


| ae ciMi | = (eo ic?) 


When the series converges, the sum ba cjuj; is independent of the order of the 
terms. 


1/2 


PROOF. For m > n, we have 


[ee ee |? = (2, cn, eS) = eG) = Gk 


This shows that the sequence { peer Cj u;} is Cauchy in A if and only if Bea lei? 
is convergent, and the first conclusion follows since H is complete. When 


3The verification appears in the problems in the companion volume Advanced Real Analysis. 
4 A gain the verification appears in the problems in the companion volume Advanced Real Analysis. 
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{ ei cju i} is convergent, we denote its limit by el cju;, and continuity 
of the norm yields || pa cjuj|| = lim, || yh cju;||. Since we have seen that 
I oF ell = (OE lel2)", the second conclusion of the lemma follows. 

Letu = )), cjuj,and let )°, cj,u;, be a rearrangement, necessarily convergent 
by what has already been proved. Suppose that the rearrangement has sum u’. The 
equality just proved shows that ||u||? = Sar |c;|? = ||u'||? since rearrangements 
of series of nonnegative reals have the same sums. Continuity of the inner product, 
together with the same computation as made above, gives 


(u,u’) =n 34 Ciuj, ey Ci j) = lim s \cj|. 
Pq Pd \<i<p, 
i=jx with k<q 


The limit on the right is }°?° ; |ci|? since )>, |c;,|? is a rearrangement of )-; |ci|”, 
and hence (u, uw’) = o-, |ci|? = |lull? = |lu’||?.. Therefore |ju — u’||? = 
(u,u) — 2Re(u, u') + (u’, u’) = |||? — 2 |u|? + |u|? =0, and uw! = wu. 


Proposition 12.7. Let S be an orthonormal set in the Hilbert space H, and let 
M be the smallest closed vector subspace of H containing S$. For each u in H, 
there are at most countably many members vy of S such that (u, vz) # 0, and 
thus the series 


E(u) = D> (U, va) va 


UgES 


has only countably many nonzero terms. The series converges independently of 
the order of the nonzero terms, E is the orthogonal projection of H on M, and E 
satisfies 


IEW? = So |, va)? < [lel?- 


UVgES 
REMARK. The final inequality of the proposition is Bessel’s inequality. 


PROOF. Let vg,,..., Ug, be a finite subset of S, and form the vector u’ = 
ae (U, Vg;)Vq;- Taking the inner product of both sides with u gives 


UD= Fo Ge Og =>, Grr)? 
j=l j=l 


and Lemma 12.6 gives 


n 
2 2 
lee’? = D5 1G, va, )I?. 
j=l 


532 XII. Hilbert and Banach Spaces 


Therefore 0 < |Ju—u' ||? = |u|]? -2Re(u, u’)+|lu’ |? = |||]? -2 |u|? +]? = 
\| ||? — Iu’ ||?, and we obtain 


lle"? < lel. (*) 
In other words, 
n 
S > |(u, va)? < [lell?, (#) 
j=l 
no matter what finite subset vg,,..., Ug, of S we use. 


The sum of uncountably many positive real numbers is infinite, since otherwise 
there could be only finitely many greater than 1/n for each n. Since ||u||? < 00, 
(«*) implies that there can be only countably many a’s with |(u, v~)|* nonzero. 
This proves the first conclusion. If we enumerate those w’s and apply Lemma 
12.6, we obtain the convergence of pane g(U, Vq)Vq to a sum independent of the 
order of the terms. 

It is evident from the formula that E is linear and that E(u) = 0 if uw is in 
M". Inequality («*) shows that the partial sums wu’ of E(u) have ||u'|| < |lu|l, 
and the continuity of the norm therefore implies that ||E(u)|| < ||u|| for all w. 
Hence F is continuous. Since E (vq) = vg for all a, EF is the identity on all finite 
linear combinations of members of S$. The continuity of F thus implies that E is 
the identity on all of M. Hence E is the orthogonal projection as asserted. The 
final assertion of the proposition follows from Lemma 12.6 and the inequality 
|| E(u) || < |lu||, which we have already proved. 


Corollary 12.8. If S is an orthonormal set in the Hilbert space H, then the 
following are equivalent: 


(a) S is maximal among orthonormal subsets of H, 
(b) w= 0 6 (uUe)¥e forall win H, 


(c) lw? =>) 6 wwe)? forall win H, 
(d) v=o as (u, Vy)(V, Ve) forall u and v in H. 


REMARKS. Condition (b) is summarized by saying that the orthonormal set S$ 
is an orthonormal basis of H. If H is infinite-dimensional, an orthonormal basis 
is not a basis in the ordinary linear-algebra sense; a passage to the limit is usually 
needed to expand vectors in terms of the basis. Condition (c), or sometimes 
condition (d), is called Parseval’s equality. Thus the corollary says that the 
orthonormal set S$ is maximal if and only if it is an orthonormal basis, if and only 
if Parseval’s equality holds. 
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PROOF. Let M be the smallest closed vector subspace of H containing S. Then 
S is maximal if and only if M+ = 0, and we replace (a) by this condition. If 
M+ =0, then E is the identity operator in Proposition 12.7, and the proposition 
shows that (b) holds. If (b) holds, Proposition 12.7 says that (c) holds. On the 
other hand, if (c) holds, then Proposition 12.7 says that ||u|| = ||E(@)|| for all u. 
For a vector u in M+, which must have E(w) = 0, this says that ||u|| = 0. Thus 
M+ = 0, and (a) holds. Hence (a), (b), and (c) are equivalent. Finally (c) and 
(d) are equivalent by polarization. 


In the context of Fourier series, Parseval’s equality ((c) in Corollary 12.8) 
was proved as Theorem 6.49, and that theorem showed also that any member of 
L*((-x, wl, x dx) is the sum of its Fourier series in the sense of convergence 
in L*. This conclusion was (b) in the corollary. The corollary is showing that 
the equivalence of (b) and (c) is just a result in abstract Hilbert-space theory. The 
extra content of Theorem 6.49 is that these conditions are actually satisfied by 
the system of exponential functions. 

One can show that the other two examples we gave in this section of orthogonal 
sets give orthonormal bases when normalized—the Legendre polynomials P, (t) 
on [—1, 1] with respect to dt and the functions Jo(k,t) on [0, 1] with respect to 
tdt. 


Proposition 12.9. Let (X, 4) and (Y, v) be o-finite measure spaces, and 
suppose that L?(X, jt) has a countable orthonormal basis {u;} and L?(Y, v) has a 
countable orthonormal basis {v;}. Then {(x, y) +> uj(x)v;(y)} is an orthonormal 
basis of L?(X x Y, w x v). 


PROOF. The functions u;(x)v;(y) are orthonormal, and Corollary 12.8 shows 
that it is enough to prove that this orthonormal set is maximal. Suppose that 
w(x, y) is an L? function on X x Y orthogonal to all of them. Then 


O= fy fy w@, y) ui) vj(y) dv(y) du(x) = fy(w, -), vj) ui(x) du(x) 


for alli and j. Since {u;}is an orthonormal basis of L?(X, fh), xt (w(x, -), vy) 
is the 0 function in L?(X, 2) foreach j. In other words, (w(x, -), vj) = Oforae. 
x [dj] for that j. Since the number of j’s is countable, (w(x, -), vj) = 0 forall j 
for a.e. x [du]. Any such x has 0 = 5°; |(w(x, -), v7 = fy lw, y)? dv(y). 
Integrating in x, we see that w is the 0 function in L?(X x Y,p xv). 


Proposition 12.10. Any orthonormal set in a closed vector subspace M of a 
Hilbert space H can be extended to an orthonormal basis of M. In particular any 
closed vector subspace M of H has an orthonormal basis. 
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PRooF. As a closed subset of a complete space, M is complete, and therefore 
M is a Hilbert space in its own right. Order by inclusion all orthonormal subsets 
of M containing the given set. The given set is one such, and the union of the 
members of a chain is an orthonormal set forming an upper bound for the chain. 
By Zorn’s Lemma we can find a maximal orthonormal set S$ in M containing the 
given one. This satisfies (a) in Corollary 12.8 and hence is an orthonormal basis. 
This proves the first conclusion, and the second conclusion follows from the first 
by taking the given orthonormal set in M to be empty. 


Proposition 12.11. Any two orthonormal bases of a Hilbert space have the 
same cardinality. 


REMARKS. Cardinality is discussed in Section A10 of the appendix. The “same 
cardinality” whose existence is proved in the proposition is called the Hilbert 
space dimension of the Hilbert space. Problem 7 at the end of the chapter shows 
that two Hilbert spaces are isomorphic as Hilbert spaces if and only if they have 
the same Hilbert space dimension. Despite the apparent definitive sound of this 
result, one must not attach too much significance to the proposition. Hilbert spaces 
that arise in practice tend to have some additional structure, and an isomorphism 
of this kind need not preserve the additional structure. 


PROOF. Fix two orthonormal bases U = {uv} and V = {vg} of a Hilbert space 
H. We define two members uy and uy of U to be equivalent if there exists a 
sequence 


Uou » VB,» Urs Ufa» +++ > Uo_p> VBn—1> Yan (*) 


with Ug, = Ug and Uy, = Ug’, With each ug, in U and each vg, in V, and with each 
consecutive pair having nonzero inner product. Define an equivalence relation in 
V similarly. 

Each equivalence class is countable. In fact, consider the class of u,,, and 
consider sequences of a fixed length. Proposition 12.7 shows that only countably 
many members of V can have nonzero inner product with u,,, only countably 
many members of U can have nonzero inner product with that, and so on. Thus 
there are only countably many sequences of any particular length. The countable 
union of these countable sets is countable, and thus there are only finitely many 
sequences connecting uy, to anything. Hence uy, can be equivalent to only 
countably many members of U. 

Let U; and V, be equivalence classes in U and V , respectively, and suppose that 
Ug, and vg, are members of U; and V; with nonzero inner product. Expand u,, in 
terms of V asua, = Vp (Uays UB) UB, retaining only the terms with (uq,, vg) F# 0. 
One of the terms making a contribution is the one with vg = vg,, and it follows 
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that any other term with (uq,, vg’) # O has vg equivalent to vg. Hence we have 


tas y (Uggs UB) UB and similarly vg, = ss (Ug), Ua )Ua- 


upeVi Ug EU 


If wy is another member of U; and we expand it in terms of V , retaining only the 
nonzero terms, then the vg’s that occur have to be equivalent to one another. So 
we have Ui, = ar EV, Ua» UB) UB for some equivalence class V2 within V. If 
we form a sequence (+) connecting ug, and Ugi,, We See that at least one member 
of V2 is connected to at least one member of V;. Thus V; = V2. Consequently 
every member of U; lies in the smallest closed vector subspace containing V), 
and every member of V; lies in the smallest closed subspace containing U,. In 
other words, U; and V are orthonormal bases for the same closed vector subspace 
of H. 

If U, is finite, then linear algebra shows that V, is finite and has the same 
number of elements. Since U; and V, are countable, the only way that either can 
be infinite is if both are countably infinite. In any event, U; and V; have the same 
cardinality. Thus we have a one-one function carrying U; onto V,. Repeating 
this process for each equivalence class within U, we obtain a one-one function 
carrying U onto V. 


3. Bounded Linear Operators on Hilbert Spaces 


In this section we briefly study bounded linear operators from a Hilbert space 
H to itself. In the finite-dimensional case we often make a correspondence 
between matrices and linear operators by using the standard basis of the space 
of column vectors. If {e;}/_, is this basis, then the correspondence between a 
matrix A = [Aj;] and a linear operator L is given by Ajj = (L(e;),e;). If 
u= eae uje; and v = )°, v;e; are column vectors, then L(u) = DF u;L(e;) and 
hence (LW), v) = ae ujvj(L(e;), e;) = Dr, v; AjjU;. 

We could extend these formulas to the case of a general Hilbert space, not 
necessarily finite-dimensional, by using a particular orthonormal basis as the 
generalization of {e;}. But no particular such basis recommends itself, and we 
work without any choice of basis as much as possible, except for purposes of 
motivation. Instead, we may think of the function (u, v) > (L(u), v) as a more 
appropriate — and canonical—analog of the matrix of L. Just as the operator norm 
of L is given by a formula that views L as an operator, namely 


LI] = sup ||L@)I|, 


lle <1 
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so there is a formula for computing the norm in terms of the function of two 
variables, namely 
|L|| = sup |(L@), v)]. 

list, 

ulls1 
To verify this formula, fix wu and let v have norm < 1. Application of the Schwarz 
inequality gives |(L(w), v)| < ||L@)|llvl| < |]}2@)|]. On the other hand, if 
L(u) # 0, we take v = ||L(u)||~'!L(u); this v has ||v|| = 1, and we obtain 
(Lu), v)| = |IL@ | '(L@), L@)) = ||L@)|]. Hence sup), <) |(L@), v)l = 
||L(u)||. Taking the supremum over ||u|| < 1 shows that the two expressions for 
|| L || are equal. 

We shall work with the “adjoint” L* of a bounded linear operator L. In terms 
of matrices in the finite-dimensional case, the matrix of L* is to be the conjugate 
transpose of the matrix of L. In other words, the (i, j)'" entry (L*(e ), €;)) of the 
matrix for L* is to be (L(e;), ej) = (e;, L(e;)). Passing to our functions of two 
variables, we want to arrange that (L*(u), v) = (u, L(v)) for all u and v. Let us 
prove existence and uniqueness of such a bounded linear operator. 


Proposition 12.12. Let L : H — H be a bounded linear operator on the 
Hilbert space H. For each u in H, there exists a unique vector L*(u) in H such 
that 

(L*(u), v) = (u, L(v)) for all v in H. 


As u varies, this formula defines L* as a bounded linear operator on H, and 
|L*|| = ZI. 


PROOF. The function v +> (L(v), u) is a linear functional on H satisfying 
(Lv), w)| < LI [I Mellllull, hence having norm < ||Z||||||u||. Being bounded, 
the linear functional is given by (L(v), uv) = (v, w) for some unique w in H, 
according to Theorem 12.5. We define L*(u) = w,and then we have (L*(u), v) = 
(u, L(v)). This formula shows that L* is a linear operator, and the computation 


|L*|| = sup |(L*(@), v)| = sup |@, L(v))| = sup |(L(v), w)| = IILIl 


lulls], lulls], lulls], 
lulls] lulls] lul|< 
shows that ||L*|| = ||Z||. 


The bounded linear operator L* in the proposition is called the adjoint of L. 
The mapping L + L* is conjugate linear. We shall be especially interested in 
the case that L* = L, in which case we say that L is self adjoint. 

Anexample of a self-adjoint operator is the orthogonal projection EF onaclosed 
vector subspace M as defined before Lemma 3.6. In fact, if u in H decomposes 
according to H = M@® M+ asu =u’ +u", then the computation (1 — E)(u) = 
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u —u' = u" shows that 1 — E is the orthogonal projection on M+. Hence 


(E(u), 1 — E)(v)) = 0 for all u and v in H, and also ((1 — E)(u), E(v)) = O. 
The first of these says that (E(u), v) = (E(u), E(v)), and the second says that 
(E(u), E(v)) = (u, E(v)). Combining these, we obtain (E(u), v) = (u, E(v)). 
Comparison of this formula with the formula in Proposition 12.12 shows that 
E=E"*, 

The Banach space B(H, H) is closed under composition. In fact, if L and M 
are in B(H, H), then linear algebra shows LM to be linear, and the computation 
(LM) (u)|| = |L(M)) I< WLM MIS ZI || 11] shows that 


LM || < WLI. 


Hence LM is in B(H, H) if L and M are. Within 6(H, H), we have (LM)* = 
M*L*. 


4. Hahn-Banach Theorem 


We return now to the setting of general normed linear spaces or Banach spaces. 
There are three main theorems concerning the norm topology of such spaces —the 
Hahn—Banach Theorem, the Uniform Boundedness Theorem, and the Interior 
Mapping Principle. These three theorems are the main subject matter of the 
remainder of this chapter. 

We shall often use symbols x, y,... for members of a normed linear space 
and symbols x*, y*,... for linear functionals. This notation has the advantage 
of allowing us to use symbols like x** for linear functionals on a space of linear 
functionals, an important notion as we shall see. 

We begin with the Hahn—Banach Theorem, which ensures the existence of 
many continuous linear functionals on a normed linear space. The theorem has 
applications even in situations in which one has a concrete realization of the dual 
space, because it shows that any closed vector subspace is characterized by the 
continuous linear functionals that vanish on the subspace. 


Theorem 12.13 (Hahn—Banach Theorem). If Y is a vector subspace of a 
normed linear space X and if y* is a continuous linear functional on Y, then there 
exists a continuous linear functional x* on X with ||x*|| = ||_y*|| such that 


x*(y) = y*(y) forall y € Y. 
The theorem as stated is derived from the following lemma, which itself goes 


under the name “Hahn—Banach Theorem” and has other applications quite distinct 
from Theorem 12.13 that are beyond the scope of this book. 
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Lemma 12.14. Let X be areal vector space, and let p be a real-valued function 
on X with 


p(x +x") < p(x) + p(x’) and p(tx) = tp(x) 


for all x and x’ in X and all real t > 0. If f is a linear functional on a vector 
subspace Y of X with f(y) < p(y) for all y in Y, then there exists a linear 
functional F on X with F(y) = f(y) for all y € Y and F(x) < p(x) for all 
xeXx., 


PROOF. Form the collection of all linear functionals on vector subspaces of 
X that extend f and that are dominated by p, and partially order the collection 
by saying that one is < another if the second is an extension of the first. If we 
have a chain of such extensions, then we can obtain an upper bound for the chain 
by taking the union of the domains and using the common value of the linear 
functionals on an element of this domain as the value of the linear functional 
forming the upper bound. The result is linear because any two members of the 
domain must lie in the domain of a single member of the chain. By Zorn’s Lemma 
let fo, with domain Yo, be a maximal extension. We shall prove that Yo = X. 

In fact, suppose that y; is a vector in X but not Yo. Every vector in the vector 
subspace Y; spanned by y; and Yo has a unique representation as y + cy;, where 
y isin Yo andc isin R. Define f; on Y; by 


fily +ey1) = fod) + ck, (*) 


where k is a real number to be specified. For a suitable choice of k, fi will be 
bounded by p and will contradict the maximality of (fo, Yo). 
Let y and y’ be in Yo. Then 


folly’) = fo) = fo’ — y) = pO! —y) = pO’ +y) + pC - 9»), 


and hence —p(-y1 — y) — foly) < po’ + y1) — foW’). 


Take the supremum of the left side over y and the infimum of the right side over 
y’, let k be any real number in between, and define f; on Y; by (+). 

To complete the proof, we are to check that f;(x) < p(x) for all x in Y,. Thus 
suppose that x = y + cy, is arbitrary in Y,. If c = 0, then f(x) < p(x) by the 
assumption on Yo. If c > 0, then 


fix) = foy)+cek < fo(y)+elp(c!y+y)— foc 'y)] = pQv+ey1) = px). 
If c < 0, then 


fie) = fo(y)+ek < fo(y)+el—p(—y1—c7 | y)— fo(e! y) 1 = p(y teyi) = pa). 


In any case, fi(x) < p(x). 
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PROOF OF THEOREM 12.13. If the field of scalars is R, then Theorem 12.13 
follows immediately from Lemma 12.14 with p(x) = ||y*|[||x|| and f = y*. 

If the field of scalars is C, if y* is given, and if, as we may, we regard X asa 
real normed linear space, then Re y* defined by (Re y*)(y) = Re(y*(y)) is areal 
linear functional on Y with 


Rey) < ly") S Ily*Illyll for ally € Y. 


By what has already been proved, we can extend Re y* without an increase in 
norm to a real linear functional F defined on all of X. Define 


x*(x) = F(x) —iF (ix). 


We show that x* has the required properties. Certainly x*(x+x") = x*(x)+x*(x’) 
and x*(cx) = cx*(x) for c real. Furthermore 


x*(ix) = F(ix) - iF (i°x) =i[F (x) —iF(ix)] = ix*(x). 
Thus x* is complex linear. On Y, we have 
(Re y*)(iy) + iim y*) (iy) = y*@y) = ty*(y) = —Um y*)(y) +i Re y*)(), 


and thus (Re y*)(iy) = —(Im y*)(y). Substituting this identity into the definition 
of x*, we obtain 


x*(y) = (Re y*)(y) — i(Re y*)(iy) = (Re y*)(y) +7im y*)(y) = y*(y) 


for y in Y. Thus x* is an extension of y*. Finally if x*(x) = re’? for r and 6 real 
andr > 0, then 


|x*(x)| = x*(ex) = Fx) < [ly* Mlle xl = lly I, 


since the nonnegative number x* (e~!? x) has 0 imaginary part. Thus ||x*|| < ||y*||. 
The reverse inequality follows because x* is an extension of y*, and the proof is 
complete. 


Corollary 12.15. If Y is a closed vector subspace of a normed linear space X 
and if xo is a vector of X not in Y, then there exists an x* in the dual X* with 


x*(y) =0 forall y € Y 
and x*(xo) = 1. 


The norm of x* can be taken to be the reciprocal of the distance from x9 to Y. 
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PROOF. Let d > 0 be the distance from x9 to Y, and let Z be the linear span of 

Xo and Y. Every x in Z has a unique expansion as x = y + cxo for some scalar c 

and some y in Y. For such an x, let z*(x) = c. Let us see that the linear function 
z* on Z satisfies 

IIe" = a7". (*) 


First we check that |z*(x)| < d7!||x||: if 40, then 
IIx ll = lly + exoll = lellle~y + xoll = leld =dlz*(@)I, 


while if c = 0, then z*(x) = 0. Thus |z*(x)| < d7!||x]| for all x, and we 
obtain ||z*|| < d7!. For the reverse inequality, let {y,} be a sequence in Y, not 
necessarily convergent, with lim, ||x9 — y,|| = d. Then 


1 = 2" — Ya) < [lz"IIllxo — yall — dllz"I, 


and hence ||z*|| > d~!. This proves (x). Applying Theorem 12.13 to z*, we 
obtain the corollary. 


EXAMPLE. To illustrate Corollary 12.15, we re-prove the result of Proposition 
11.21a that C(S) is dense in L?(S, 2) if S is a compact Hausdorff space, ju is 
a regular Borel measure on S, and p satisfies 1 < p < oo. For definiteness let 
us suppose that the underlying scalars are real. If C(S) were not dense, then 
the corollary would produce a continuous linear functional € on L’?(S, jz) that 
vanishes on C(S) but is not identically 0 on L?(S, w). Theorem 9.19 says that 
£ has to be given by integration with some member g of L? (S, 1), where p’ is 
the dual index: €(f) = de fgdw for all f in L?(S, w). Since € vanishes on 
C(S), we have f, fg du = 0 for all f € C(S). Thus f, fgt du = J, fg” du 
for all f € C(S). Here gt du and g~ dy are Borel measures on S, regular by 
Proposition 11.20, and they yield the same positive linear functional on C(S). 
Applying the uniqueness in the Riesz Representation Theorem (Theorem 11.1), 
we obtain gt du = g” dw and therefore gt = g~ almost everywhere. Since g* 
and g~ are nowhere both nonzero, gt = g~ = 0 almost everywhere. Hence g is 
the O function, and £ = 0, contradiction. 


Corollary 12.16. If X is a normed linear space and if x9 ¥ 0 is a vector in X, 
then there is an x* in X* with 


Ix" =1 and x" (Xo) = [Ixoll. 


PRooF. Apply Corollary 12.15 with Y = 0 and multiply by ||xo|| the linear 
functional that is produced by that corollary. 
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Corollary 12.16, when applied to x9 = x — x’, shows that there are enough 
continuous linear functionals on a normed linear space X to separate points. Also, 
it implies that the only vector xo in X with x*(xo) = 0 for all x* in X* is xo = 0. 
The third corollary we have already seen for L? spaces with 1 < p < ow in 
Proposition 9.8, at least when the measure space is o-finite. 


Corollary 12.17. If X is a normed linear space and xo is in X, then 


IIxoll = sup [x*(%)|. 
Ix" 


PROOF. If ||x*|| < 1, then |[x*(o)| < []x*I[Illlxoll < |lxoll, and therefore 
SUPjjx*,<1 1X" Xo) < |lxoll. The linear functional of Corollary 12.16 shows that 
equality holds. 


We have seen for o-finite measure spaces that X = L!(S, 2) may be identified 
with L°(S, jz) via integration. In turn every member of L!(S, 2) then acts as a 
continuous linear functional on L™(S, jz) via integration. This change of point 
of view amounts to the implementation of a certain canonically defined linear 
mapping of X into X**, which we now define for general normed linear spaces. 

Let X be a normed linear space, and let X** be the dual of X*. We define a 
linear operator 1: X — X** by 


Cay) =x" @) for all x* € X*, 
and we call : the canonical map of X into X**. 


Corollary 12.18. If X is a normed linear space, then the canonical map 
tu: X — X*™ has ||e(x)|| = ||x|| for all x and in particular is one-one. Conse- 
quently if X is complete, then 1(X) is a closed vector subspace of X™*. 


PROOF. We have 


I[e(x)|] = sup |(e(x))(@*)| = sup |x*(x)] = IIx], 


x* <1 x*I|s1 


the last step holding by Corollary 12.17. This proves the first conclusion. Because 
{ preserves norms, X complete implies that 1(X) is a complete subset of the 
complete space X** and is therefore closed, by Corollary 2.43. 


A Banach space X is said to be reflexive if the canonical map carries X onto 
X**, Warning: This is a more restrictive condition than to say that there is some 
norm-preserving linear mapping of X onto X**. 
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Finite-dimensional normed linear spaces are reflexive since linear functionals 
in this case are automatically continuous and since the vector-space dual of 
a finite-dimensional vector space has the same dimension as the space itself. 
Hilbert spaces are reflexive as a consequence of the Riesz Representation Theorem 
in its form in Theorem 12.5. The spaces L’?(S, jz) for a o-finite measure space, 
when 1 < p < ov, are reflexive as a consequence of the Riesz Representation 
Theorem? in its form in Theorem 9.19. However, L!(S, w) and L©(S, 2) are 
often not reflexive, as is shown below in Proposition 12.19 and Corollary 12.21. 


Proposition 12.19. If (S, 1) is a o-finite measure space with infinitely many 
disjoint sets of positive measure, then L!(S, j) is not reflexive. 


PROOF. Theorem 9.19 shows that the Banach space X = L!(S, 4) has X* = 
L°°(S, 2), the isomorphism being given by integration. Therefore it is enough to 
produce a continuous linear functional on L™ (S, jz) that is not given by integration 
with an L! function. 

Thus let {£,,} be a sequence of disjoint sets of positive measure, and let Y be 
the vector subspace of functions in L°°(S, 1) that are constant on each EF, and 
have values on the F,,’s tending to a finite limit as n tends to infinity. Let y* of 
such a function be the limit. Then y* is a linear functional on Y of norm 1. By 
the Hahn—Banach Theorem (Theorem 12.13), there exists a linear functional x* 
defined on all of L°(S, jw), having norm 1, and restricting to y* on Y. Suppose 
that there is some g in L'(S, w) withx*(f) = a fg dw forall f in Y, quite apart 
from all f in L™(S, w). If f is 1 on E, and is 0 elsewhere, then x*(f) = 0, and 
hence f E, 8 du = 0. In other words, f E, 8 du = 0 for every n. If we next take f 
to be 1 on UG pasa E,, and to be O elsewhere, then x*(f) = 1. On the other hand, 
this f has 


x*(f) = fs fgdu = fy 2, 84h = Dna Jp, 8du = 0, 
and we have a contradiction. 


Proposition 12.20. If X is a Banach space and its dual X* is reflexive, then 
X is reflexive. 


PRooF. Let: : X — X*™* and i* : X* — X*** be the canonical maps. 
Arguing by contradiction, suppose that X is not reflexive. Since 1(X) is a closed 
proper vector subspace of X**, Corollary 12.15 produces a nonzero member 
x of X** such that x***(1(X)) = 0. Since X™* is reflexive by assumption, 
there exists x* in X* with x*** = c*(x*). If x is in X, then we have 0 = 
x™*a(x)) = (x*)) C(x) = C(x) (x*) = x* (x), and hence x* = 0. But then 


x*** = 1*(x*) = 0, and we have a contradiction. 


> Actually, the o-finiteness is not needed for 1 < p <0. 
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Corollary 12.21. If (S, 2) is a o-finite measure space with infinitely many 
disjoint sets of positive measure, then L™(S, jz) is not reflexive. 


PROOF. Theorem 9.19 shows that the Banach space X = L!(S, jz) has X* = 
L°°(S, 4), the isomorphism being given by integration. If X* were reflexive, then 
X would have to be reflexive by Proposition 12.20, in contradiction to Proposition 
12.19. 


5. Uniform Boundedness Theorem 


The second main theorem about the norm topology of normed linear spaces is the 
Uniform Boundedness Theorem, also known as the Banach-—Steinhaus Theorem. 
This result involves a parametrized family of linear operators from one normed 
linear space into another, and it is assumed that the domain is complete. Two kinds 
of boundedness as a function of one variable are assumed — boundedness of each 
linear operator as a function on (the unit ball of) the domain and boundedness in the 
parameter for each fixed member of the domain. The conclusion is boundedness 
in the two variables jointly. 


Theorem 12.22 (Uniform Boundedness Theorem). If {L,} is a set of bounded 
linear operators from a Banach space X into a normed linear space Y such that 


|La(x)|| < Cx for all a, 
then there is a constant C independent of x such that ||Lq|| < C for alla. 
PROOF. For each positive integer , the set 
F, = {x € X| Le(x)|| <2 forall a} 


is closed in X, being the intersection of inverse images of closed sets in Y under 
continuous functions, and |)~, F, = X by assumption. By the Baire Category 
Theorem (Theorem 2.53b), one of the sets, say Fy, contains a nonempty open 
subset B of X. Then || L_(x)|| < N for all a and for all x in B. If B contains the 
open ball in X of radius 2r > 0 and center b, then ||x|| <7 implies that x + b is 
in B and that 


Lo (*)I| = Lae +b) — La (byl < Lol + 5) + Le) ll = N+ Cp, 
independently of a. Hence ||x || < 1 implies 
Lax) =r" La(rx)l| <r '(N + Cp). 


In other words, ||Le|| < r7!(N + Cp). 
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EXAMPLE. Let us use the theorem to give a proof that the Fourier series of 
a continuous periodic function need not converge at some point. Consider the 
Banach space X of all continuous periodic functions f on [—z, 2] with the 
supremum norm. Let D, be the Dirichlet kernel as in Section I.10, given by 


He sin((n + 4)r) 
D,(t) = S a, 
(== 4 sin 5t 


The n" partial sum of the Fourier series of f is 
1 8 
Sn(f3 x) = ~| f(x —t)D,(t) dt. 
20m J_x 


Define linear functionals ¢, on X by 
1 4 
EAL) = SnD) = se f(—t)Dn(t) dt. 


Each of these is bounded; specifically || 2, || < 2n-+ 1 because || D, lhguss <2n+1. 
If the Fourier series of each continuous function f were to converge at 0, then 
lim, €n(f) would exist for each f, and hence we would have |£,(f)| < Cy for 
a constant Cy independent of n. The Uniform Boundedness Theorem would say 
that ||2,|| < C for some constant C independent of n. The norm equality of 
Theorem 11.26 or 11.28 would then allow us to conclude that ee |D,(t)| dt is 
bounded. In fact, the numbers ke |D,(t)| dt are unbounded, according to the 
following proposition, and thus there exists a continuous periodic function whose 
Fourier series diverges at x = 0. 


Proposition 12.23. The numbers 


1 8 
L,==— D,(t)| dt 
on [i (t)| 
have the property that 
L, =4n 7 logn + O(1), 


where O(1) denotes an expression bounded as a function of n. Hence L,, is 
unbounded with n. 


REMARK. The numbers L,, are sometimes called Lebesgue constants. 
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PROOF. By writing sin((n + 5)t) = sinnt cos st + cos nt sin st, we see that 
D,(t) = sinnt cot st +cosnt = 2t7! sinnt + h,(t), 


where /,(t) is bounded in the pair (n, t) for |t] < a. If we let O(1) denote an 
expression bounded as a function of n, then 


2 [” |sinnt| 
= / dt + O(1) 


ao eee 
2 [7 |sinnt 

= i, LE re, ah 
wT Jo t 


dt + O(1) 


2 fo | sinnt| 
1 k 


k=0 vka/n t 


2 7/" sinnt 2 pale e 1 
= dt 4 innt ———_|dt+ O(1). 
- | t =f inn) | ae Ba 


The first term on the right side is bounded, and the sum in brackets lies between 
m'n(l+5+---+-4) and m'n(5+++-+4), 


which are upper and lower Riemann sums for 2 ~!n [ A t—' dt and have difference 
a'n(l — 1). Thus the sum in brackets is equal to 7~!n(logn + O(1)). The 
integral of sinnt over [0, 2/n] is 2/n, and the result follows. 


6. Interior Mapping Principle 


The third main theorem about the norm topology of normed linear spaces is the 
Interior Mapping Principle. This result involves a single bounded linear operator 
from one normed linear space into another, and it is assumed that the domain and 
the range are both complete. The theorem is that if the operator is onto the range, 
then it carries open sets to open sets. 


Theorem 12.24 (Interior Mapping Principle). If L is a continuous linear 
operator from a Banach space X onto a Banach space Y, then L carries open 
subsets of X to open subsets of Y. 
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PROOF. Let B, be the closed ball in X with center 0 and radius r, and let U, 
be the open ball in Y with center 0 and radius s. The proof is in three steps. 

The first step is to show that (L(B,))“ contains an open neighborhood of 0 in 
Y. To do so, we use the fact that L is onto Y to write 


Y =L(X)= ECS B,) = a ee L(B,). 
Thus Y = LCR, and the Baire Category Theorem (Theorem 2.53b) 
shows that one of the sets (L(B,))“ contains a nonempty open set. Since L is 
linear and since multiplication by 2n is a homeomorphism of Y, (L(Bn)) = 
(L(2nB,/2))" = (2nL(Bi/2))"! = (2n)(L(B12))"', and we see that (L(B1/2))*! 
contains some nonempty open subset V of Y. If v and v’ are in V, they are in 
(L(Bi/2))" and there exist sequences {v,} and {v/,} in L(B,/2) with v, > v and 
v), > v’. By linearity, v, — v/, is in L(B), and passage to the limit shows that 
v —v’ isin L(B,)“. The set V — V of such differences v — v’ is the union over 
v' € V of V—v’, hence is the union of open sets and is open. Since Oisin V—V, 
the set V — V is an open neighborhood of 0 lying in L(B,)". 

The second step is to show that the image of any neighborhood of 0 in X is 
a neighborhood of 0 in Y. The previous step shows that (L (B,))" > U, for 
some s > 0, and we show for every c > 0 that L(B,) D> Usc/2. Fort > 0, 
multiplication of the inclusion (L(B,))“! D U, by t shows that 


(LB) 2 Uy (x) 


since multiplication by t is a homeomorphism of Y and L is linear. If y is in 
Usc/2, we are to produce x in B, with L(x) = y, and we do so by successive 
approximations. Specifically we construct inductively the terms x, of a conver- 
gent series in X with sum x, as follows: Condition («) with t = c/2 allows us 
to choose a member x, of B./2 with ||y — L(x1)|| < 2-*sc. If x1,...,Xn—1 have 
been constructed with each x; in By-j, and with 


lly — L@i +++ +42) <2“ sc, 


then y — L(xy +--+ +X,_1) isin U-n,,.. Condition («) with t = 2~"c shows that 
we can find x, in By-», with 


lly — LQ +++ + Xn—1) — LG) I < 2° se. 


We now have 


ly — LG +++ + pa + xn) || < 278 Ys. 
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This completes the inductive construction of the x,,’s, and we shall prove that the 
series )\ x, is convergent in X. Since X is complete, it is enough to show that 
the partial sums of }> x, are Cauchy. If g > p, then 

| atte — Donan | = | Sen | = raph 1 nll Ss ae 2-"e. 
The right side is < 2~?c, and the partial sums of }~ x, are indeed Cauchy. Let 
x = °°. Xn. Taking p = 0 and using the continuity of the norm, we see that 
|x|| < c. By continuity of L, we have y = lim, L(x, +--+ + x,) = L(&). 
Consequently the member y of Us¢/2 is of the form L(x) for some x in B,, as was 
asserted. 

The third step is to show that each open set of X is mapped to an open set of 
Y by L. Let U be open in X, let x be in U, and let N be an open neighborhood 
of 0 in X such that x + N C U. The previous step shows that there is some 
open neighborhood V of 0 in Y such that V C L(N). Then L(x) + V is an open 
neighborhood in Y of L(x) with 


LaX)+V CL@)+LN) =LAaA+N) CL). 


Therefore L(U) contains a neighborhood about each of its points and must be 
open. 


Corollary 12.25. A one-one continuous linear operator L of a Banach space 
X onto a Banach space Y has a continuous linear inverse. 


PROOF. Since L is one-one onto, L~! exists. For L~! to be continuous, the 
inverse image under L~! of each open set is to be open. In other words, the direct 
image under L of any open set is to be open. But this is just the conclusion of 
Theorem 12.24. 


EXAMPLE. Let ¥ be the Fourier coefficient mapping, which carries functions 
in L! (+ dx) to doubly infinite sequences {c,} vanishing at infinity. The linear 
operator F¥ has norm | when the space of doubly infinite sequences is given the 
supremum norm I{Cn}llsup = sup, |¢,|. Corollary 6.50 shows that F is one-one. 
Let us see that there is some doubly infinite sequence vanishing at infinity that 
is not the sequence of Fourier coefficients of some L! function. If this were 
not so, then Corollary 12.25 would say that F~! is bounded. We can obtain a 
contradiction if we produce a sequence { f,} of L! functions with || fy, |, = 1 for 
all n and with lim,, || F( fw Ilsup = 0. Form the Dirichlet kernel D,, as defined 
in Section I.10 and reproduced in the previous section. Its Fourier coefficients 
cz, are 1 for |k| < n and are O for |k| > n, and thus ||F(D,)||,,,. = 1. Put 


sup 
fn = Dnf\Dnll,. Then | fall, = 1 for all n, and |F(fa)Mup = 1/lDall,- 
Proposition 12.23 shows that in fact lim, 1/|| Dy ||; = 0, and we obtain the desired 
contradiction. The conclusion is that the image of F on L! fails to include some 
doubly infinite sequence {c,} vanishing at infinity. 
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If f : X — Y is a function between Hausdorff spaces, the graph of f is 
the subset G = {(x, f(x)) | x € X} of X x Y. If f is continuous, then G is 
a closed set, as we see immediately by using nets. The converse fails because 
f : [0,1] — R with f() = 0 and f(x) = 1/x for x > 0 is a discontinuous 
function with closed graph. 

We shall be interested in the converse under the additional condition that our 
function f is linear. Our spaces being metric spaces, the condition that the graph 
be closed is that whenever {(x,, f(x,))} converges to some (x, y), then x is in 
the domain of f and f(x) = y. 

Linearity by itself is not enough to get an affirmative result. In fact, let X¥ = 
C(O, 1]), let Xo be the vector subspace of functions with a continuous derivative, 
and let L : Xo — X be the derivative operator F +> F’. Iflim, F,, = F in X and 
lim, F,, = H, then Theorem 1.23 shows that F’ exists and equals H. Hence the 
linear operator L : Xo — X has closed graph. However, L is unbounded since 
the function x b> x” has norm 1 and its derivative has norm n. 


Corollary 12.26 (Closed Graph Theorem). If L : X — Y is a linear operator 
from a Banach space X into a Banach space Y such that the graph of L is a closed 
subset of X x Y, then L is a bounded linear operator. 


PROOF. Make X @Y into a Banach space by defining ||(x, y) || = [lx lly +llylly- 
The graph G = {(x, L(x)) | x € X} of L is a vector subspace of X @ Y since 
L is linear, and it is closed by hypothesis. Thus G is a Banach space. The 
linear operator P : G > X given by P((x, L(x)) = x is one-one and onto, and 
Corollary 12.25 shows that the linear operator P~! : X > G given by P~!(x) = 
(x, L(x)) is continuous. If E denotes the projection of X @ Y to the Y coordinate, 
then E is bounded with norm < 1, and hence the restriction E | we G > Y is 
bounded with norm < 1. Therefore the composition (£ | Ps oe ae ae ae aa 
bounded. But (BE) CoG) = E(x, L(x)) = L(x), and thus L is bounded. 


EXAMPLE. Suppose that a Banach space X is the vector-space direct sum of 
two closed vector subspaces: X = Y @ Y’. Let E : X — Y be the projection of 
X on Y given by E(y + y’) = y. Corollary 12.26 implies that E is bounded. In 
fact, let x, = yy, + y/, define a sequence in X, so that (xn, yn) defines a sequence 
in the graph of E. Suppose that lim, (%), yn) = (Xo, yo) in X x X, ie., that 
lim, X, = Xo and lim, y, = yo. Here xo is in X, and yg is in Y since Y is closed. 
Then yy = lim, y, = lim, x, — lim, yy = Xo — yo, and this is in Y’ since Y’ is 
closed. The equality x9 = yo + yg shows that E(xo) = yo, and therefore (xo, yo) 
is in the graph of E. In other words, the graph of E is closed. We conclude from 
Corollary 12.26 that E is bounded. 
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7. Problems 


Let X be a normed linear space. 

(a) Prove that the closure of the open ball of radius r and center x9 is the closed 
ball of radius r and center xo. 

(b) If X is complete, prove that any decreasing sequence of closed balls has 
nonempty intersection. 


The normed linear space C“?([a, b]) was defined in Section 1. Prove that it is 

complete. 

The normed linear space H®(D) and its vector subspace A(D) were defined in 

Section 1. Prove that H°(D) is complete and that A(D) is a closed subspace, 

hence complete. 

Let X bea Banach space, let Y be a closed vector subspace, and define ||x + Y || = 

infyey ||x + y|| for x + Y in the quotient vector space X/Y. 

(a) Show that || - +Y|| is a norm for X/Y. 

(b) By replacing a Cauchy sequence {x, + Y} in X/Y by a subsequence such 
that ||X,, — Xn, + Y|l < 2-*, show that the subsequence can be lifted to a 
Cauchy sequence in X and deduce that X/Y is a Banach space. 


Let vj, ..., UV, be vectors in an inner-product space. Their Gram matrix is the 
Hermitian matrix of inner products given by G(vj,...,U,) = [(u;, vj)], and 
det G(v1,..., U7) is called their Gram determinant. 


al 
(a) If cy,...,c, are in C, letc = ( : } Prove that c™G(v1,...,U,)€ = 
lc. vy aPee ar Civall’. 
(b) Making use of the finite-dimensional Spectral Theorem, prove that there 


exists a unitary matrix u such that the matrix ua! G(v1,..., U,)u is diagonal 

with diagonal entries > 0. 
(c) Prove that det G(vj,..., U,) => 0 with equality if and only if vj, ..., v, are 

linearly dependent. (This generalizes the Schwarz inequality.) 
(Gram-Schmidt orthogonalization process) Let (u;,...,u,) be a linearly 
independent ordered set in an inner-product space, and inductively define v} = 
uj, vp = lu; totyy, Vi = Uk — Sat (u, vj)vj, and vg = Iu, oly. Prove that 
the vectors vj,..., U, are well defined, that v;,..., v, are orthonormal, and that 
for each k with | <k <n, span{vj,..., vg} = span{u,,..., ux}. 


Let H; and H> be Hilbert spaces with respective orthonormal bases {u,} and 
{ug}. If there is a one-one function carrying the one orthonormal basis onto the 
other, prove that there is a bounded linear operator F : H; — Hp) carrying Hy 
onto H and preserving distances. Deduce that H; and H> are isomorphic as 
Hilbert spaces if and only if they have the same Hilbert space dimension. 
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10. 


11. 


12. 


13. 


14. 


15. 
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Let (S, 2) be a o-finite measure space, and let f be in L™(S, x). 

(a) Show that multiplication by f is a bounded linear operator on L?(S, 2), and 
find the norm of this operator. 

(b) Find the adjoint of the operator in (a). 


Suppose that X is a normed linear space and that its dual X* is separable in its 
norm topology, with {x*} as a countable dense set. For each n, choose x, in X 
with ||x,|| < 1 and |xt(x,)| = 5 llx% ||. Prove that {x,} is dense in X, so that X* 
separable implies X separable. 


By considering the discontinuous indicator function /;;,;, where so is a limit point 
of S, prove that the Banach space C (S) is not reflexive if S is compact Hausdorff 
and infinite. 


Without using the Baire Category Theorem, prove that the Uniform Boundedness 
Theorem for linear functionals implies the same theorem for linear operators. 


Suppose for eachn that L,, : X — X’ isa bounded linear operator from a normed 
linear space X to a Banach space X’ such that ||L,,|| < C with C independent 
of n. Suppose in addition that {Z,,()} converges for each y in a dense subset Y 
of X. Prove that L(x) = lim, L,(x) exists for all x in X and that the resulting 
function L : X — X’' is a bounded linear operator with ||L|| < C. 


Let X be a normed linear space, and let {x,} be a subset of X. If sup, |x*(xa)| < 
oo for each x* in X*, prove that sup, ||xq|| < oo. 


Let X be a Banach space. A subset E of X is convex if it contains all points 

(1 — t)x + ty with O < t < | whenever it contains x and y. 

(a) Show that any closed ball {y | ly —x| <r} is convex. 

(b) Give an example of a decreasing sequence of nonempty bounded closed 
convex sets in a Banach space with empty intersection. 


Let X and Y be Banach spaces, and let L be a bounded linear operator from X 
onto Y. Suppose that {y,} is a convergent sequence in Y with limit yo. Prove 
that there exists a constant M and a sequence {x,} in X such that ||x,|| < M|lynll 
for alln, L(x,) = yp for all n, and {x,} is convergent. 


Problems 16-18 introduce “Banach limits,’ a kind of universal summability method. 
Let X be the real Banach space of real-valued bounded sequences s = {s,}"° , with 
the supremum norm. 


16. 


17. 


Let Xo be the smallest closed vector subspace of X containing all sequences with 
terms 51, 52 — 52,53 — S2,... such that {s,} is in X. Prove that the sequence e 
with all terms 1 is not in Xo. 


A Banach limit is defined to be any member x* of X* with ||x*|| = 1,x*(e) = 1, 
and x*(x9) = 0 for all xo in Xo. Prove that a Banach limit exists. 


18. 
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Let LIMn-+00 5, denote the value of a Banach limit when applied to the member 
{s,} of X. Prove that this satisfies 

(a) LIMy-+. 5, > Oifs, > 0 for all n. 

(b) LIMn-+00 Sn41 = LIM) +00 Sn for every {s,}in X. 

(c) LIMn-+.0 Sp = 0 if all terms s, are 0 for n sufficiently large. 

(d) lim inf, 5) < LIMn—oo Sn < limsup, s, for all {s,} in X. 


(e) LIMn+0 Sn = c if {s,} is convergent with limit c. 


Problems 19-24 establish the Jordan and von Neumann Theorem that a normed linear 
space satisfying the parallelogram law acquires its norm from an inner product, the 
definition of the inner product being (x,y) = >>, thx + i*y||?, where the sum 
extends for k € {0,2} if the scalars are real and extends for k € {0, 1, 2,3} if the 
scalars are complex. The norm is recovered from the inner product by the usual 
formula (x, x) = ||x||?. Thus let X be a normed linear space with norm || - || such 
that the parallelogram law holds. 


19. 


20. 


21. 


22. 


23. 


24. 


Check from the definition of (x, y) that (x,x) = IIx ||, that (x, x) > O with 
equality if and only if x = 0, and that (x, y) = (y, x). 


Prove the identity 


lx + y + 2]? = Ix + yll? + lx + 21? + My + zl? = Ux? = Wy? = Ilzil* 


for allx, y,zinX. 


Derive the formula (x; + x2, y) = (x1, y) + (%, y) from the identity in the 
previous problem. 

Let D be the set of rationals if the scalars are real, or the set of all a + bi with a 
and b rational if the scalars are complex. Using the definition of (x, y) and the 
result of the previous problem, prove that (rx, y) = r(x, y) ifr is in D. 

By considering ||x — ry||* for r in D with r tending to (x, y)/|ly||?, prove that 
(-, -) satisfies the Schwarz inequality. 

By estimating |r (x, y) — (cx, y)| with the Schwarz inequality when c is a scalar 


and r is a member of D tending to c, prove that c(x, y) = (cx, y), thereby 
completing the proof that (-, - ) is an inner product. 


APPENDIX 


Abstract. This appendix treats some topics that are likely to be well known by some readers and 
less well known by others. Section Al deals with set theory and with functions: it discusses the 
role of formal set theory, it works in a simplified framework that avoids too much formalism and the 
standard pitfalls, it establishes notation, and it mentions some formulas. Some emphasis is put on 
distinguishing the image and the range of a function, as this distinction is important in algebra and 
algebraic topology and therefore plays a role when real analysis begins to interact seriously with 
algebra. 

Sections A2 and A3 assume knowledge of Section I.1 and discuss topics that occur logically 
between the end of Section I.1 and the beginning of Section I.2. The first of these establishes 
the Mean Value Theorem and its standard corollaries and then goes on to define the notion of a 
continuous derivative for a function on a closed interval. The other section gives a careful treatment 
of the differentiability of an inverse function in one-variable calculus. 

Section A4 is a quick review of complex numbers, real and imaginary parts, complex conjuga- 
tion, and absolute value. Complex-valued functions appear in the book beginning in Section I5. 
Section AS states and proves the classical Schwarz inequality, which is used in Chapter II to establish 
the triangle inequality for certain metrics but is needed before that in Chapter I in the context of 
Fourier series. 

Sections A6 and A7 are not needed until Chapter II. The first of these defines equivalence relations 
and establishes the basic fact that they lead to a partitioning of the underlying set into equivalence 
classes. The other section discusses the connection between linear functions and matrices in the 
subject of linear algebra and summarizes the basic properties of determinants. 

Section A8, which is not needed until Chapter IV, establishes unique factorization for polynomials 
with real or complex coefficients and defines “multiplicity” for roots of complex polynomials. 

Sections A9 and A10 return to set theory. Section A9 defines partial orderings and includes 
Zorn’s Lemma, which is a powerful version of the Axiom of Choice, while Section A10 concerns 
cardinality. The material in these sections first appears in problems in Chapter V; it does not appear 
in the text until Chapter X in the case of Section A9 and until Chapter XII in the case of Section A10. 
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Real analysis typically makes use of an informal notion of set theory and notation 
for it in which sets are described by properties of their elements and by operations 
on sets. This informal set theory, if allowed to be too informal, runs into certain 
paradoxes, such as the Russell paradox: “If S is the set of all sets that do not 
contain themselves as elements, is S a member of S or is it not?” The conclusion 
of the Russell paradox is that the “set” of all sets that do not contain themselves 
as elements is not in fact a set. 
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Mathematicians’ experience is that such pitfalls can be avoided completely by 
working within some formal axiom system for sets, of which there are several 
that are well established. A basic one is “Zermelo—Fraenkel set theory,” and the 
remarks in this section refer specifically to it but refer to the others at least to 
some extent.! 

The standard logical paradoxes are avoided by having sets, elements (or “en- 
tities”), and a membership relation € such that a € S is a meaningful statement, 
true or false, if and only if a is an element and S is a set. The terms set, element, 
and € are taken to be primitive terms of the theory that are in effect defined by 
a system of axioms. The axioms ensure the existence of many sets, including 
infinite sets, and operations on sets that lead to other sets. To make full use of 
this axiom system, one has to regard it as occurring in the context of certain rules 
of logic that tell the forms of basic statements (namely, a = b,a € S, and “S 
is a set”), the connectives for creating complicated statements from simple ones 

“or, “and,” “not,” and “if... then’), and the way that quantifiers work (‘there 
exists” and “for all’). 

Working rigorously with such a system would likely make the development 
of mathematics unwieldy, and it might well obscure important patterns and di- 
rections. In practice, therefore, one compromises between using a formal axiom 
system and working totally informally; let us say that one works “informally but 
carefully.” The logical problems are avoided not by rigid use of an axiom system, 
but by taking care that sets do not become too “large”: one limits the sets that one 
uses to those obtained from other sets by set-theoretic operations and by passage 
to subsets.” 

A feature of the axiom system that one takes advantage of in working informally 
but carefully is that the axiom system does not preclude the existence of additional 
sets beyond those forced to exist by the axioms. Thus, for example, in the subject 
of coin-tossing within probability, it is normal to work with the set of possible 
outcomes as S = {heads, tails} even though it is not apparent that requiring this 
S to be a set does not introduce some contradiction. 

It is worth emphasizing that the points of the theory at which one takes particu- 
lar care vary somewhat from subject to subject within mathematics. For example, 
it is sometimes of interest in calculus of several variables to distinguish between 
the range of a function and its image in a way that will be mentioned below, but it 
is usually not too important. In homological algebra, however, the distinction is 


‘Mathematicians have no proof that this technique avoids problems completely. Such a proof 
would be a proof of the consistency of a version of mathematics in which one can construct the 
integers, and it is known that this much of mathematics cannot be proved to be consistent unless it 
is in fact inconsistent. 

>Not every set so obtained is to be regarded as “constructed.” The Axiom of Choice, which we 
come to shortly, is an existence statement for elements in products of sets, and the result of applying 
the axiom is a set that can hardly be viewed as “constructed.” 
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extremely important, and the subject loses a great deal of its impact if one blurs 
the notions of range and image. 

Some references for set theory that are appropriate for reading once are 
Halmos’s Naive Set Theory, Hayden—Kennison’s Zermelo—Fraenkel Set Theory, 
and Chapter 0 and the appendix of Kelley’s General Topology. The Kelley book 
is one that uses the word “class” as a primitive term more general than “set”; it 
develops von Neumann set theory. 


All that being said, let us now introduce the familiar terms, constructions, 
and notation that one associates with set theory. To cut down on repetition, one 
allows some alternative words for “set,” such as family and collection. The word 
“class” is used by some authors as a synonym for “set,” but the word class is used 
in some set-theory axiom systems to refer to a more general notion than “set,” 
and it will be useful to preserve this possibility. Thus a class can be a set, but we 
allow ourselves to speak, for example, of the class of all groups even though this 
class is too large to be a set. Alternative terms for “element” are member and 
point; we shall not use the term “entity.” Instead of writing € systematically, we 
allow ourselves to write “in?’ Generally, we do not use € in sentences of text as 
an abbreviation for an expression like “is in” that contains a verb. 

If A and B are two sets, some familiar operations on them are the union AU B, 
the intersection AM B, and the difference A — B, all defined in the usual way in 
terms of the elements they contain. Notation for the difference of sets varies from 
author to author; some other authors write A \ B or A ~ B for difference, but 
this book uses A — B. If one is thinking of A as a universe, one may abbreviate 
A — Bas B°, the complement of B in A. The empty set @ is a set, and so is the 
set of all subsets of a set A, which is sometimes denoted by 24. Inclusion of a 
subset A in a set B is written A C B or B D A. Inclusion that does not permit 
equality is denoted by A G B or B 2 A; in this case one says that A is a proper 
subset of B or that A is properly contained in B. 

If A is a set, the singleton {A} is a set with just the one member A. Another 
operation is unordered pair, whose formal definition is {A, B} = {A}U{B} and 
whose informal meaning is a set of two elements in which we cannot distinguish 
either element over the other. Still another operation is ordered pair, whose 
formal definition is (A,B) = {{A}, {A, B}}. It is customary to think of an 
ordered pair as a set with two elements in which one of the elements can be 
distinguished as coming first. 

Let A and B be two sets. The set of all ordered pairs of an element of A and 


3Unfortunately a “sequence” as in Chapter I gets denoted by {x1, x2,...} or {xn}Po,. If its 


notation were really consistent with the above definitions, we might infer, inaccurately, that the 
order of the terms of the sequence does not matter. The notation for unordered pairs, ordered pairs, 
and sequences is, however, traditional, and it will not be changed here. 
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an element of B is a set denoted by A x B; it is called the product of A and B 
or the Cartesian product. A relation between a set A and a set B is a subset of 
A x B. Functions, which are to be defined in a moment, provide examples. Two 
examples of relations that are usually not functions are “equivalence relations,” 
which are discussed in Section A6, and “partial orderings,” which are discussed 
in Section A9. 

If A and B are sets, a relation f between A and B is said to be a function, 
written f : A > B,if for each x € A, whenever y € B and z € B are such 
that (x, y) and (x, z) arein f, then y = z. If (x, y) isin f, we write f(x) = y. 
In this informal but careful definition of function, the function consists of more 
than just a set of ordered pairs; it consists of the set of ordered pairs regarded as 
a subset of A x B. This careful definition makes it meaningful to say that the 
set A is the domain, the set B is the range, and the subset of y € B such that 
y = f(x) for some x € A is the image of f. The image is also denoted by 
f (A). Sometimes a function f is described in terms of what happens to typical 
elements, and then the notation is x +> f(x) orx + y, possibly with y given by 
some formula or by some description in words about how it is obtained from x. 
Sometimes a function f is written as f(-), with a dot indicating the placement 
of the variable; this notation is especially helpful in working with restrictions of 
functions, which we come to in a moment, and with functions of two variables 
when one of the variables is held fixed. This notation is useful also for functions 
that involve unusual symbols, such as the absolute value function x +> |x|, which 
in this notation becomes | - |. The word map or mapping is sometimes used 
for “function” and for the operation of a function, particularly when a geometric 
context for the function is of importance. 

Often mathematicians are not so careful with the definition of function. De- 
pending on the degree of informality that is allowed, one may occasionally refer 
to a function as f(x) when it should be called f orx +> f(x). If any confusion is 
possible, it is wise to use the more rigorous notation. Another habit of informality 
is to regard a function f : A — B as simply a set of ordered pairs. Thus two 
functions f; : A — B and fy : A > C become the same if f;(a) = f(a) for 
all a in A. With the less careful definition, the notion of the range of a function is 
not really well defined. The less careful definition can lead to trouble in algebra, 
but it does not often lead to trouble in real analysis until one gets to a level where 
algebra and analysis merge somewhat. 

The set of all functions from a set A to a set B is aset. It is sometimes denoted 
by B4. The special case 24 that arose with subsets comes by regarding 2 as a 
set {1, 2} and identifying a function f from A into {1,2} with the subset of all 
elements x of A for which f(x) = 1. 

If a subset B of a set A may be described by some distinguishing property 
P of its elements, we may write this relationship as B = {x € A | P}. For 
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example, the function f in the previous paragraph is identified with the subset 
{x € A | f(x) = 1}. Another example is the image of a general function 
f :A— B,namely f(A) = {y € B | y = f(x) for some x € A}. Still more 
generally along these lines, if E is any subset of A, then f(£) denotes the set 
{y € B | y = f(x) for some x € E}. Some authors use a colon instead of a 
vertical line in this notation. 

This book frequently uses sets denoted by expressions like (),~5 Ax, an in- 
dexed union, where S is a set that is usually nonempty. If S is the set {1, 2}, this 
reduces to A; U A. In the general case it is understood that we have an unnamed 
function, say f, given by x > A,, having domain S and range an unnamed set 
T ,and (),.-5 Ax is the set of all y € T such that y is in A, for some x € S. When 
S is understood, we may write ), A, instead of LU... Ax. Indexed intersections 
(\+es Ax are defined similarly, and this time it is essential to disallow S empty 
because otherwise the intersection cannot be a set in any useful set theory. 

There is also an indexed Cartesian product X Ax that specializes in the 
case that S = {1, 2} to Ay x Az. Usually S is assumed nonempty. This Cartesian 
product is the set of all functions f from S into L),.; Ax such that f(x) is in 


xeS 
A, for all x € S. In the special case that S is {1,...,}, the Cartesian product 
is the set of ordered n-tuples from n sets A;,..., A, and may be denoted by 
A, X +++ x Aj; its members may be denoted by (a),...,a,) with a; € Aj; for 


1 < j <n. When the factors of a Cartesian product have some additional 
algebraic structure, the notation for the Cartesian product is sometimes altered; 
for example, the Cartesian product of groups A, is denoted by [<5 Ax- 

It is completely normal in real analysis, and it is the practice in this book, to 
take the following axiom as part of one’s set theory; the axiom is normally used 
without specific mention. 


Axiom of Choice. The Cartesian product of nonempty sets is nonempty. 


If the index set is finite, then the Axiom of Choice reduces to a theorem of 
set theory. The axiom is often used quite innocently with a countably infinite 
index set. For example, Proposition 1.7c asserts that any sequence in IR* has a 
subsequence converging to lim sup a, , and the proof constructs one member of the 
sequence at a time. When these members have some flexibility in their definitions, 
as is the case with the proof as it is written for Proposition 1.7c, the Axiom of 
Choice is being invoked. When the members instead have specific definitions, 
such as “the term a, such that n is the smallest integer satisfying such-and-such 
properties,’ the axiom is not being invoked. The proof in the text of Proposition 
1.7c can be rewritten with specific definitions and thereby can avoid invoking the 
axiom, but there is no point in undertaking this rewriting. In Chapter II the axiom 
is invoked in situations in which the index set is uncountable; uses of compactness 
provide a number of examples. 
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From the Axiom of Choice, one can deduce a powerful tool known as Zorn’s 
Lemma, whose use it is normal to acknowledge. Zorn’s Lemma appears in 
Section A9 and is used in problems beginning in Chapter V and in the text 
beginning in Chapter X. 

If f : A > B isa function and B is a subset of B’, then f can be regarded 
as a function with range B’ in a natural way. Namely, the set of ordered pairs is 
unchanged but is to be regarded as a subset of A x B’ rather than A x B. 

Let f : A— Band g: B — C be two functions such that the range of f 
equals the domain of g. The composition g o f : A — C is the function with 
(g o f)(x) = g(f()) for all x. Because of the construction in the previous 
paragraph, it is meaningful to define the composition more generally when the 
range of f is merely a subset of the domain of g. 

A function f : A — B is said to be one-one if f (x1) # f (x2) whenever x, 
and x» are distinct members of A. The function is said to be onto, or often “onto 
B; if its image equals its range. The terminology “onto B” avoids confusion: it 
specifies the image and thereby guards against the use of the less careful definition 
of function mentioned above. A mathematical audience often contains some 
people who use the careful definition of function and some people who use the 
less careful definition. For the latter kind of person, a function is always onto 
something, namely its image, and a statement that a particular function is onto 
might be regarded as a tautology. 

When a function f : A — B is one-one and is onto B, there exists a function 
g : B — Asuch that g o f is the identity function on A and f o g is the identity 
function on B. The function g is unique, and it is defined by the condition, for 
y € B, that g() is the unique x € A with f(x) = y. The function g is called 
the inverse function of f and is often denoted by f7!. 

Conversely if f : A — B has an inverse function, then f is one-one and 
is onto B. The reason is that a composition g o f can be one-one only if f is 
one-one, and in addition, that a composition f o g can be onto the range of f 
only if f is onto its range. 

If f : A — B is a function and E is a subset of A, the restriction of f 
to E, denoted by f | p> is the function f : E — B consisting of all ordered 
pairs (x, f(x)) with x € E, this set being regarded as a subset of E x B, not of 
Ax B. One especially common example of a restriction is restriction to one of the 
variables of a function of two variables, and then the idea of using a dot in place 
of a variable can be helpful notationally. Thus the function of two variables might 
be indicated by f or (x, y) f(x, y), and the restriction to the first variable, 
for fixed value of the second variable, would be f(-, y) orx B f(x, y). 

We conclude this section with a discussion of direct and inverse images of 
sets under functions. If f : A — B is a function and E is a subset of A, we 
have defined f(F) = {y € B | y = f(x) forsome x € EF}. This is the same 
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as the image of f | , and is frequently called the image or direct image of E 
under f. The notion of direct image does not behave well with respect to some 
set-theoretic operations: it respects unions but not intersections. In the case of 


unions, we have 
AVES (Ie) 


ses ses 


ses Es) 2 f (Es) for each s, and the inclusion 
C follows because any member of the left side is f of a member of some E,. In 


the case of intersections, the question f(E NM F) = S(E)N f (F) can easily have 
a negative answer, the correct general statement being f(ENF) C f(E)Nf(F). 
An example with equality failing occurs when A = {1, 2, 3}, B = {1,2}, fC) = 
f@3) =1, f@Q) =2, E = {1,2} and F = {2,3} because f(E MN F) = {2} and 
f(E)0 f(F) = {1,2}. 

If f : A — Bisa function and E is a subset of B, the inverse image of E 
under f is the set f~'(E) = {x € A | f(x) € E}. This is well defined even if f 
does not have an inverse function. (If f does have an inverse function f—!, then 
the inverse image of E under f coincides with the direct image of E under f~!.) 

Unlike direct images, inverse images behave well under set-theoretic opera- 
tions. If f : A — B isa function and {E, | s € S$} is a set of subsets of B, 


then 
PB ya re: 
ses 


ses 


f-'(U 8) =f", 


ses ses 
(EDS G Edy. 


In the third of these identities, the complement on the left side is taken within 
B, and the complement on the right side is taken within A. To prove the 
first identity, we observe that Poles E;) Cc fl (Es) for each s € S and 
hence fOr es Es) et ies f—'(Es). For the reverse inclusion, if x is in 
Ores fo (E;), then x is in [a (E;) for each s and thus f(x) isin E; foreach s. 
Hence f(x) is in(),-5 Es, and x is in foe ( fleece Ee This proves the reverse 
inclusion. The second and third identities are proved similarly. 


the inclusion > follows since f ( J 


A2. Mean Value Theorem and Some Consequences 


This section states and proves the Mean Value Theorem and two standard corol- 
laries, and then it discusses the notion of a function with a continuous derivative 
on a closed interval. It makes use of results in Section I.1 of the text. 
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Lemma. Let [a, b] be a nontrivial closed interval, and let f : [a, b] — R be 
a continuous function that is differentiable on (a, b) and has f(a) = f(b) = 0. 
Then the derivative f’ satisfies f’(c) = 0 for some c witha <c <b. 


PROOF. We divide matters into three cases. If f(x) > 0 for some x, let 
c be a member of [a, b] where f attains its maximum (existence by Theorem 
1.11). Since f(x) > 0 somewhere, we must have a < c < b. Thus f'(c) 
exists. If f’(c) > 0, then the inequality limp. h"(fle +h) — f(c)) > 0 
forces f(c +h) > f(c) for h positive and sufficiently small, in contradiction to 
the fact that f attains its maximum at c. Similarly if f’(c) < 0, then we find 
that f(c —h) > f(c) for h positive and sufficiently small, and again we have a 
contradiction. We conclude that f’(c) = 0. 

If f(x) < 0 for all x and f(x) < 0 for some x, let c instead be a member of 
[a, b] where f attains its minimum. Arguing in the same way as in the previous 
paragraph, we find that f’(c) = 0. 

Finally if f(x) = 0 for all x, then f’(x) = 0 fora < x < b,and f'(c) =0 
for c = 5(a + b), for example. 


Mean Value Theorem. Let [a,b] be a nontrivial closed interval. If 
f : [a,b] — Ris a continuous function that is differentiable on (a, b), then 


b) — 
fo) = MO) Le) 


b— 
for some c witha <c <b. 


PROOF. Apply the lemma to the function 


= fo)-~f@ 
gx) = f@)-f@—-@-a =, 

which has g(a) = g(b) =O and g(x) = f’(x) -— Oo, 

Corollary 1. A differentiable function f : (a,b) — R whose derivative is 0 
everywhere on (a, b) is a constant function. 

Proor. If f(a’) 4 f(b’), then the Mean Value Theorem produces some c 
between a’ and b’ where f’(c) £ 0. 

Corollary 2. A differentiable function f : (a,b) — R whose derivative is 
> 0 everywhere on (a, b) is strictly increasing on (a, b). 


PROOF. If a’ < b’ and f(a’) = f(b’), then the Mean Value Theorem produces 
some c witha’ < c < b’ where f'(c) < 0. 
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In the setting of the Mean Value Theorem, it can happen that f’(x) has a 
finite limit C as x decreases to a (or as x increases to b). This terminology 
means that for any « > 0, there exists some 6 > 0 such that | f’(x) — C| < « 
whenever a < x <a-+6. In this case, f can be extended to a function F defined 
and continuous on (—oo, b], differentiable on (—oo, b), in such a way that F’ is 
continuous at a. In fact, the extended definition is 


f(x) fora<x <b, 
F(x)= 
f(a)+C@ —-a) for —co <x <a. 
To see that F’ (a) exists for the extended function F’,, let € > 0 be given and choose 


5 > Osuch that a < x <a+6 implies | f’(x) —C| < «. Ifa <x <a+6, then 
the Mean Value Theorem gives 


F(x) — F(a) 


X—a 


= F'(c) 


witha <c <x < a+6,and hence |““-" 


ee | ae 


X—a 


—C| <e.lIfa—6 <x <a,then 


C| =0. 

x—a 

Thus F’(a) exists and equals C. The definitions make lim,_,, F’(x) = F'(a), 
and hence F’ is continuous at a. 


As a consequence of this construction, it makes sense to say that a continuous 
function f : [a,b] — R with a derivative on (a, b) has a continuous derivative 
at one or both endpoints. This phrasing means that f’ has a finite limit at the 
endpoint in question, and it is equivalent to say that f extends to a larger set 
so as to be differentiable in an open interval about the endpoint and to have its 
derivative be continuous at the endpoint. 


A3. Inverse Function Theorem in One Variable 


This section addresses one of the “further topics” mentioned at the end of Sec- 
tion I.1 and assumes knowledge of Section I.1 and some additional facts about 
continuity and differentiability of functions of a real variable. The topic is that 
of differentiability of inverse functions, the nub of the matter being continuity 
of the inverse function. The topic is one that is sometimes skipped in calculus 
courses and slighted in courses in real variable theory. Yet it is necessary for 
the development of one of the two functions exp and log, of one of the two 
functions sin and arcsin, and of one of the two functions tan and arctan unless 
actual constructions of both members of a pair are given. In principle the matter 
arises also with differentiation of the function x!/% on (0, 00), but the proposition 
of this section can be readily avoided in that case by explicit calculations. 
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Proposition. Let (a,b) be an open interval in R, possibly infinite, and let 
f : (a,b) — R be a function with a continuous everywhere-positive derivative. 
Then f is strictly increasing and has an interval (c, d), possibly infinite, as its 
image. The inverse function g : (c,d) — (a,b) exists and has a continuous 
derivative given by g’(y) = 1/f’(g(y)). 


PROOF. The function f is strictly increasing as a corollary of the Mean Value 
Theorem, and its image is an interval (c, d) because of the Intermediate Value 
Theorem (Theorem 1.12). Being one-one and onto, f has an inverse function g, 
according to Section Al. Fix yo € (c,d), fix c’and d’ such that c < c’ < yo < 
d' < d,and consider y ¥ yg in (c’,d’). Put x = g(y), x9 = g(yo), a’ = g(c’), 
and b’ = g(d’). Thena <a’ < x9 < b’ < bsince f is strictly increasing. 

By Theorem 1.11, there exist real numbers m and M such thatO < m < 
f'(t) < M for all t € [a’, b’]. The Mean Value Theorem produces & between xo 
and x such that 


ly — yol =1f@) — FQ) = |f'E)llx — xol = mlx — xol, 


and hence |x — xo| < m~'|y — yo|. Since g is one-one, we have x 4 xo. Also, 
f(x) = y ¥ yo = f (Xo). Thus it makes sense to form 


g(y) — go) __ «7x0 
y— Yo f (x) — fo) | 


Let € > 0 be given. Since lim,-, ,, eee = f'(xo) # 0, we have 


li t — Xo 1 

im = . 
t>x0 f(t) — f(%o) — fo) 
Choose 7 > 0 such that 


t— Xo 1 2 
f(t)— f(%o) — f’(xo) 


as long as |t — x9| < 9 with t # xo andt ¢€ [a’,b’]. Then put 6 = nm. If 
ly — yo| < 6, then |x — xo| < m7!|y — yo| < m7!6 = n. Since t = x satisfies 
the condition |f — x9| < 7 witht # xo and t € [a’, b’], it follows that 


€ 


&(y) — g(yo) 1 | = Xx — Xo 1 oe 
y— yo f'(%o) f@)— fo) ~~ f’@o) 


whenever |y — yo| < 6. Since € is arbitrary, the conclusion is that g’(yo) = 
1/f'(g(y0)). Since g is differentiable, g is continuous and also the composition 
f’og iscontinuous. Because f’og is nowhere zero, g’ = 1/(f’og) is continuous. 
This completes the proof. 
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A4. Complex Numbers 


Complex numbers are taken as known, and this section reviews their notation and 
basic properties. 

Briefly, the system C of complex numbers is a two-dimensional vector space 
over R with a distinguished basis {1, i} and a multiplication defined initially by 
11 =1,1i =i1 =i, and ii = —1. Elements may then be written as a + bi or 
a+ ib with a and b in R; here a is an abbreviation for a1. The multiplication is 
extended to all of C so that the distributive laws hold, i.e., so that (a+ bi)(c +di) 
can be expanded in the expected way. The multiplication is associative and 
commutative, the element | acts as a multiplicative identity, and every nonzero 
element has a multiplicative inverse: (a + bi (ap ize) = 1. 

Complex conjugation is indicated by a bar: the conjugate of a + bi is a — bi 
if a and D are real, and we write a + bi = a— bi. Then we havez + w=z+w, 
rz=rzifr isreal,and zw = zw. 

The real and imaginary parts of z = a+ bi are Rez = a and Imz = b. 
These may be computed as Rez = $(z +z) andImz = —£(z —Z). 

The absolute value function of z = a + bi is given by |z| = /a? + b?, and 
this satisfies |z|? = zz. It has the simple properties that |z| = |z|, |Rez| < |zl, 
and | Im z| < |z|. In addition, it satisfies 


Izw| = |z||w| 


because Izw|? = ZW2W = ZwWZw = ZZww = Iz|7|w|*, 


and it satisfies the triangle inequality 


Iz + w| < |z|+|w| 


because Ize twl? =(ztuw)(z+w) =2zz7+z20+wz+ ww 
= |z|? + 2Re(zw) + |wl? < |zl? + 2|[zw| + |wl* 


= |z|? + 2Iz||w| + wil? = (zi + lw)’. 


A5. Classical Schwarz Inequality 


The inequality in question is as follows.* 


4Tn the classical setting below, the inequality is often called the “Cauchy—Schwarz inequality” 
and may have other people’s names attached to it as well. However, generalizations tend to be called 
simply the “Schwarz inequality,” and this book therefore drops all names but Schwarz. 
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Schwarz inequality. Let (a,,...,a,) and (b,,..., b,) be n-tuples of complex 
numbers. Then 


ahi < (> jal?) (Do il?) 


PROOF. We add n-tuples of complex numbers entry by entry, and we multiply 
such an n-tuple by a complex scalar by multiplying each entry of the n-tuple 
by that scalar. For any n-tuples of complex numbers a = (aj,...,d@,) and 


b = (by,..., bn), define jal = (U7, laxl2)', |b] = (S07, Ibe)”, 


(a, b) = bam ayDx. 
The Schwarz inequality says that 0 < 0 if b = (0,...,0), and thus we may 
assume that b is something else. In this case, |b| 4 0. Then 


and 


0 < |a — |b *(a, b)b|’ = (a = ||, bb, a — |b|-*(a, bb) 
= |a|? — 2|b|-7|(a, by)? + [BIA*1(@, b) 7 |b)? = lal? — [B17 1, By’, 


and the asserted inequality follows. 


A6. Equivalence Relations 


An equivalence relation on a set S is a relation between S and itself, i.e., is a 
subset of S x S, satisfying three properties. We define the expression a ~ b, 
written “a is equivalent to b,” to mean that the ordered pair (a, b) is a member of 
the relation, and we say that “~” is the equivalence relation. The properties are 
(i) a ~a forall ain S,ic., ~ is reflexive, 
(ii) a ~ bimplies b ~ a if a and D are in S,ie., ~ is symmetric. 

(iii) a ~ band b ~ ¢ together imply a ~ c if a,b, and c are in S,ie., ~ is 

transitive. 

An example occurs with S equal to the set Z of integers with a ~ b meaning 
that the difference a — b is even. The properties hold because (1) 0 is even, (ii) 
the negative of an even integer is even, and (iii) the sum of two even integers is 
even. 

There is one fundamental result about abstract equivalence relations. The 
equivalence class of a, written [a] for now, is the set of all members b of S such 
that a x b. 


Proposition. If ~ is an equivalence relation on a set S, then any two equiv- 
alence classes are disjoint or equal, and S is the union of all the equivalence 
classes. 
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PROOF. Let [a] and [b] be the equivalence classes of members a and b of S. 
If [a] N [b] 4 S, choose c in the intersection. Then a ~ c and b ~ c. By (ii), 
c ~ b, and then by (iii), a ~ b. If d is any member of [b], then b ~ d. From 
(iii), a ~ b and b ~ d together imply a ~ d. Thus [b] C [a]. Reversing the 
roles of a and b, we see that [a] C€ [b] also, whence [a] = [b]. This proves the 
first conclusion. The second conclusion follows from (i), which ensures that a is 
in [a], hence that every member of S lies in some equivalence class. 


EXAMPLE. With the equivalence relation on Z that a ~ b if a — b is even, 
there are two equivalence classes—the subset of even integers and the subset of 
odd integers. 


The first two examples of equivalence relations in this book arise in Chapter II. 
The first example, which is in Section I.2 and concerns a passage from “pseu- 
dometric spaces” to “metric spaces,” yields equivalence classes exactly as above. 
The second example, which is in Section II.3, is a relation “is homeomorphic 
to” and implicitly is defined on the class of all metric spaces. This class is not 
a set, and Section A1 of this appendix suggested avoiding using classes that are 
not sets in order to avoid the logical paradoxes mentioned at the beginning of the 
appendix. There is not much problem with using general classes in this particular 
situation, but there is a simple approach in this situation for eliminating classes 
that are not sets and thereby following the suggestion of Section Al without 
making an exception. The approach is to work with any subclass of metric spaces 
that is a set. The equivalence relation is well defined on the set of metric spaces 
in question, and the proposition yields equivalence classes within that set. This 
set can be an arbitrary subclass of the class of all metric spaces that happens to be 
a set, and the practical effect is the same as if the equivalence relation had been 
defined on the class of all metric spaces. 


A7. Linear Transformations, Matrices, and Determinants 


A certain amount of linear algebra, done with real or complex scalars, is taken 
as known. The topics of vectors, vector spaces, operations on matrices, row 
reduction of matrices, spanning, linear independence, bases, and dimension will 
not be reviewed here. This section will concentrate on the correspondence be- 
tween linear transformations and matrices in the finite-dimensional case, and on 
the elementary properties of determinants. So as to be able to handle real and 
complex scalars simultaneously, we denote by F either R or C. 

The linear transformations in question will be functions with domain F” and 
range F’”. As is emphasized for the case F = R in Section II.1, the members of 
these spaces are to be regarded as column vectors with entries in F even if, in order 
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to save space, one occasionally writes them horizontally with commas between 
entries. This is an important convention, since it makes matrix operations and 
operations with linear transformations correspond to each other in the same order 
without the need to transpose any matrix. The standard bases for F” and F” are 


often denoted by {e1,..., en} and {u1, ..., Um}, respectively, in this book, where 
1 0 0 
0 1 0 
el x 2 ’ e2 = e ’ > en — 2 
0 0 1 


1 0 0 

1 0 

uy = ’ unz= > > Um = ‘ 
0 0 1 


are m-entry column vectors. 

A function T : F” — F” is a linear function if it satisfies T(x + y) = 
T (x) + T(y) and T(cx) = cT (x) for all x and y in F” and all elements c of F. 
The terms “linear transformation” and “linear map” are used also. 

An example is obtained from any m-by-n matrix A with entries in F, namely 
T(x) = Ax, the right side being a matrix product. The size of A needs emphasis: 
the number of rows equals the dimension of the range, and the number of columns 
equals the dimension of the domain. 

Conversely if T : F” — F” is a linear function, then there is a unique 
such matrix A such that T(x) = Ax for all x in F": the j™ column of A is 
T(e;) for 1 < j <n. For example, if T : R2 — R? is the rotation about 


the origin counterclockwise through an angle 6, then T (3) = c) and 


Ve i = eee \ Consequently A = ee oe ). 

Sometimes it is necessary to have a notation for the entries of a matrix A, and 
this text uses Aj; to indicate the entry of A in the i row and j" column. If a 
matrix is defined entry by entry, the entries being M;;, the text will occasionally 
refer to the whole matrix as [M;;]. This convention is especially handy if Mj; is 
given by some nontrivial expression like du; /0x; that involves i and j. 

We can give a tidy formula for the correspondence T < A if we define a dot 
product in F’” by 


(a1,..-,@m) + (b1,...,0m) = ab) + +++ +anbm 


with no complex conjugations involved. The correspondence of a linear function 
T in L(F”, F”) to a matrix A with entries in F is then given by 


Ajj = T (e;) “Uj. 
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The correspondence T <> A of linear functions to matrices carries certain 
vector spaces associated to T to vector spaces associated with A. The kernel 
of T, namely the set of vectors x with T(x) = 0, corresponds to the null space 
of A, the set of column vectors with Ax = 0. The image of 7, as defined in 
Section Al, corresponds to the column space of A, the linear span of the columns 
of A. The method of row reduction of matrices shows that 


#{columns of A} = dim(null space of A) + dim(span of rows of A), 
while a little argument with bases shows that 
dim(domain of 7) = dim(kernel of 7) + dim(image of 7). 


In these two equations the left sides are equal, and the first terms on the two right 
sides are equal. Therefore the second terms on the two right sides are equal, and 
we obtain 


dim(span of rows of A) = dim(span of columns of A). 


The common value of the two sides of this equation is called the rank of A or 
of T. 

Under this correspondence of linear functions between column-vector spaces 
with matrices of the appropriate size, composition of linear functions corresponds 
to matrix product in the same written order. In other words, suppose that 
T : F"  F" corresponds to A of size m-by-n and that U : F” — IF* corresponds 
to B of size k-by-m. Then U o T : F” — F* corresponds to BA of size k-by-n. 

The determinant function A +> det A has domain the set of all square 
matrices over F and has range F. It is uniquely defined by the three properties 


(i) det A is linear in each row of A if the other rows are held fixed, 
(ii) det A = 0 if two rows of A are equal, 
(iii) det 7 = 1 if J denotes the identity matrix of any size. 


These properties enable one to calculate det A by row reducing the matrix A. 
Specifically replacement of a row by the sum of it and a multiple of another row 
leaves det A unchanged, multiplication of a row by a constant to make the diagonal 
entry be one means pulling out the diagonal entry as a scalar factor multiplying 
the determinant, and interchanging two rows multiplies the determinant by —1. 
After the row reduction is complete for a square matrix, either the reduced row- 
echelon form is the identity matrix and (iii) says that the determinant is | or else 
the reduced row-echelon form has a row of 0’s, and (i) and (ii) imply that the 
determinant is 0. 

The determinant function has the following additional properties, which may 
be regarded as consequences of (i), (ii), and (iii) above: 
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(iv) det A ~ 0 if and only if A is invertible, 
(v) det A = det A", where A" is the transpose of A, 
(vi) det(AB) = (det A)(det B), 

(vii) det A= )°, (sgno) Alo) +++ An,o(n) if Ais n-by-n with entries Aj, ;; the 
sum is taken over all permutations o of {1,...,}, with sgno denoting 
the sign of o, _ 

(viii) (expansion by cofactors) forn > 1 if A;; denotes the (n — 1)-by-(n— 1) 
matrix obtained by deleting the i row and j™ column from the n-by-n 
matrix A, then det A = vie (—1)'*/ A;; det Aj; for all i and det A = 
baer (-1)'t/ Aij det Ajj for all j, 

(ix) (Cramer’s rule) if det A 4 0, if v is in R", and if A; denotes the matrix 
obtained by replacing the j" column of A by v, then the j" entry of the 
unique solution x € R” of Ax = v is x; = det A; / det A. 


A8. Factorization and Roots of Polynomials 


The first objective of this section is to prove unique factorization of real and 
complex polynomials. Let F denote either the reals IR or the complex numbers 


We work with polynomials with coefficients in F. These are expressions 
P(X) =a,X"+---+a,X +a witha,, ..., a1, a9 inF. Although it is tempting 
to think of P(X) as a function with independent variable X ,, it is better to identify 
P with the sequence (do, a), ...,@,,0,0,...) of coefficients. For this setting, a 
polynomial (in one “indeterminate”’) may be defined as a sequence of members 
of F such that all terms of the sequence are 0 from some point on. The indexing of 
the sequence is to begin with 0. Addition, scalar multiplication, and polynomial 
multiplication are then defined in the expected way so as to match the operations 
on functions. The usual associative, commutative, and distributive laws are then 
valid. 

Nevertheless, it is still convenient to use the notation X in writing explicit 
polynomials. If 7 is in F, we can evaluate P(X) = a,X" +---+a,X + ao at 
r, and the result is the number P(r) = ayr" + --- + ar +o. We say that r 
is aroot of P if P(r) = 0. The degree of a polynomial P, denoted by deg P, 
is the largest integer n such that the coefficient of X” is nonzero; the notion of 
“degree” is left undefined for the 0 polynomial, i.e., the polynomial all of whose 
coefficients are 0. A factor of a polynomial A(X) is a polynomial B(X) such 
that A(X) = B(X)Q(X) for some polynomial Q(X); we say also that B(X) 
and Q(X) divide A(X). In this case, if B and Q are not 0, then A is not 0 and 
deg A = deg B + deg Q. 
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Division Algorithm. If A(X) and B(X) are polynomials with coefficients in 
F and if B(X) is not the 0 polynomial, then there exist unique polynomials Q(X) 
and R(X) such that 
(a) A(X) = B(X)Q(X) + R(X) and 
(b) either R(X) is the 0 polynomial or deg R < deg B. 


REMARK. This result codifies the usual method of dividing polynomials in 
high-school algebra. That method writes A(X)/B(X) = Q(X) + R(X)/B(X), 
and then one obtains the above result by multiplying by B(X). The polynomial 
Q is the quotient in the division, and R(X) is the remainder. 


PROOF OF UNIQUENESS. If A = BQ, + R; also, then B(Q — Q|) = 
R,—R. Without loss of generality, Rj — R is not the 0 polynomial since otherwise 
Q — Q,; = Oalso. Then 


deg B + deg(Q — Q;) = deg(R,; — R) < max{deg R, deg R,} < deg B, 


and we have a contradiction. 


PROOF OF EXISTENCE. If A = 0 or deg A < deg B, we take Q = O and 
R = A, and we are done. Otherwise we induct on deg A. Assume the result 
for degree < n — 1, and let deg A = n. Write A = a,X" + A, with Aj = 0 
or deg A; < deg A. Let B = b, X* + B, with B, = 0 or deg By < deg B. Put 
OQ, =a,b,'X"-*. Then 


A— BQ) =a,X" + Aj —a,X" — aqb,' X"*B, = Ay — ayb,'X"-*B, 


with the right side equal to 0 or of degree < deg A. Then the right side, by 
induction, is of the form BQz + R, and A = B(Q; + Q2) + R is the required 
decomposition. 


Corollary 1 (Factor Theorem). Ifr is in F and P is a polynomial, then X —r 
divides P if and only if P(r) = 0. 


Proor. If P = (X —1r)Q, then P(r) = ( —r)Q(r) = 0. Conversely 
let P(r) = 0. Taking B(X) = X —r in the Division Algorithm, we obtain 
P= (X —r)+R with R = OordegR < deg(X —r) = 1. Ineither event we 
have O = P(r) = (r —r)O(r) + R(r), and thus R(r) = 0. Of the two choices, 
we must have R = 0, and then P = (X —r)Q. 


Proposition. If P is a nonzero polynomial with coefficients in F andifdeg P = 
n,then P has at most n distinct roots. 
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PROOF. Let r1,...,7n41 be distinct roots of P(X). By the Factor Theo- 
rem, X — r, is a factor of P(X). We prove inductively on k that the product 
(X —1r,)(X —1r2)---(X — r;) is a factor of P(X). Assume that this assertion 
holds for k, so that P(X) = (X —r,)-+-(X —rz) Q(X) and 


O= Prey) = ep — 11) + Tee — ke) O41). 


Since the r;’s are distinct, we must have Q(r;,,;) = 0. By the Factor Theorem, 
we can write Q(X) = (X —rx41) R(X) for some polynomial R(X). Substitution 
gives P(X) = (X —1r1) ++ (X — re) (X — rep) R(X), and (X —11) +++ (X —re41) 
is exhibited as a factor of P(X). This completes the induction. Consequently 


P(X) = (X — 11) ++ (X = rng) S(X) 


for some polynomial S(X). Comparing the degrees of the two sides, we find that 
deg S = —1, and we have a contradiction. 


A greatest common divisor of polynomials A and B with B ¥ 0 is any 
polynomial D of maximum degree such that D divides A and D divides B. 
The Euclidean algorithm is the iterative process that makes use of the Division 
Algorithm in the form 


A=BQ,+R,, R, = Oor deg R; < deg B, 
B=R,Q.+k, Ry = Oor deg Ro < deg Ri, 
R, = RoQ3 i R3, R3 =0Oor deg R3 = deg Ro, 


Ry-2 = Rn-1 On + Rn, R, =Oor deg R, < deg Ry_1, 
Rn-1 = Ri Ont. 


In the above computation the integer n is defined by the conditions that R, 4 0 
and that R,4; = 0. Such ann must exist since deg B > deg Ri > --- > 0. 


Theorem. Let A and B be polynomials with coefficients in F and with B 4 0, 
and let Ri,..., R, be the remainders generated by the Euclidean algorithm when 
applied to A and B. Then 


(a) R, iS a greatest common divisor of A and B, 

(b) the greatest common divisor D of A and B is unique up to scalar multi- 
plication, 

(c) any D, that divides both A and B necessarily divides D, 

(d) there exist polynomials P and Q with AP + BQ =D. 
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PROOF. Let D; divide A and B. From A = BQ, + Rj, we see that D, 
divides R;. From B = R,Q> + Ro, we see that D, divides Rj. Continuing 
in this way through R,-2 = Ry-1Qn + Ry, we see that D, divides R,. In 
particular any greatest common divisor D of A and B divides R,, and therefore 
has deg D < degR,. In the reverse direction, Rp-1 = RnQn+1 shows that 
R, divides Ry»-;. From Ry-2 = Rn—1Qn + Rn, we see that R, divides Ry_2. 
Continuing in this way through B = R;Q2 + Ro, we see that R, divides B. 
Finally A = BQ, + R, shows that R, divides A and B. Thus R,, is a divisor of 
both A and B, and we have seen that its degree is maximal. This proves (a). 

If D is a greatest common divisor of A and B, it follows that D divides R,, 
and deg D = deg R,,. This proves (b). We have seen that any D, that divides 
A and B necessarily divides R,,, and then (c) follows from the uniqueness of the 
greatest common divisor up to scalar multiplication. 

Put Rnai = 0, Ro = B, and R_; = A. We prove by induction downward 
that there are polynomials S, and T;, such that Ry Sx + Rx4iT, = D. The base 
case of the induction is k = n, where we have R,1 + Rn4+10 = D. Suppose 
that R.S, + RepiT, = D with k > 0. We rewrite Ry; = ReQxs1 + Regi as 
Rigi = Re-1 — R~Qx41 and substitute to obtain 


D = RSet Regi Th = ReSe + Re-1 Te — Re Qe. 


In other words, we can take S,;_; = T, and T, = S, — Qx41, and our inductive 
assertion is proved for k — 1. The assertion for —1 proves (d). 


A nonzero polynomial P with coefficients in F is prime if the only factors of 
P are the scalar multiples of 1 and the scalar multiples of P. 


Lemma. If A and B are nonzero polynomials with coefficients in F and if P 
is a prime polynomial such that P divides AB, then P divides A or P divides B. 


PROOF. Suppose that P does not divide A. Then | is a greatest common divisor 
of A and P, and part (d) of the above theorem produces polynomials S$ and T 
such that AS + PT = 1. Multiplication by B gives ABS + PTB = B. Then P 
divides ABS because it divides AB, and P divides PT B because it divides P. 
Hence P divides B. 


Theorem (unique factorization). Every polynomial of degree > 1 with coef- 
ficients in F is a product of primes. This factorization is unique up to order and 
to scalar multiplication of the prime factors. 


PROOF. If A is given and is not prime, decompose A = BC withdeg B < deg A 
and degC < deg A. For each factor that is not prime, write the factor as the 
product of two polynomials of lower degree. This process, when continued in 
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this fashion, must stop since the degrees strictly decrease with any factorization. 
This proves existence. 

For uniqueness, assume the contrary and choose m > 1 as small as possible so 
that some polynomial has two distinct factorizations P; --- Pm = Q1--- Q» into 
primes, apart from order and scalar factors. Adjusting scalar multiples, we may 
assume that each P; and Q, has leading coefficient | and that there is a global 
coefficient multiplying each side. These global coefficients must be equal, being 
the coefficients of the largest power of X on each side. Thus we may cancel them 
and assume that each P; and Q, has leading coefficient 1. By the lemma, the 
fact that Q; is prime means that Q; must divide one of P,,..., P,. Reordering 
the factors, we may assume that Q, divides P,. Since P, is prime, P, is a scalar 
multiple of Q;. Since P, and Q, both have leading coefficient 1, P; = Q). 
Then we can cancel P; and Q, from both of our factorizations, obtaining distinct 
factorizations with fewer than m factors on one side. By the minimality of m, 
either we have arrived at a contradiction or we now have the polynomial | left on 
one side. Then the other side is 1, and the two sides match. 


If F is R, then X* + 1 is prime. But X? + 1 is not prime when F = C since 
X? +1 = (X +i)(X —i). The Fundamental Theorem of Algebra, stated below, 
implies that every prime polynomial over C is of degree 1. It is possible to prove 
the Fundamental Theorem of Algebra within complex analysis as a consequence 
of Liouville’s Theorem or within modern algebra as a consequence of Galois 
theory and the Sylow theorems. This text gives a proof of the result in Section 
IL.7 using the Heine—Borel Theorem and other facts about compactness. 


Fundamental Theorem of Algebra. Any polynomial with coefficients in C 
and with degree > 1 has at least one root. 


Corollary. Let P be a nonzero polynomial of degree n with coefficients in C, 
and let r1,..., 7, be the roots. Then there exist unique integers m; > 0 such that 
P(X) is a multiple of mee (X — r;)"’. The numbers mj have YS mj =n. 


PROOF. We may assume that deg P > 0. We apply unique factorization to 
P(X). It follows from the Fundamental Theorem of Algebra and the Factor 
Theorem that each prime polynomial with coefficients in C has degree 1. Thus 
the unique factorization of P(X) has to be of the form c []/_,(X — zz) for some 
complex numbers that are unique up to order. The z;’s are roots, and every root is a 
z,, by the Factor Theorem. Grouping like factors proves the desired factorization 
and its uniqueness. The numbers m; have a mj; =n by acount of degrees. 


The integers m; in the corollary are called the multiplicities of the roots of the 
polynomial P(X). 
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A9. Partial Orderings and Zorn’s Lemma 


A partial ordering on a set S is a relation between S and itself, i.e., a subset of 
S x S, satisfying two properties. We define the expression a < b to mean that the 
ordered pair (a, b) is a member of the relation, and we say that “<” is the partial 
ordering. The properties are 
i) a <a foralla in S,1e., < is reflexive, 
(ii) a < band b < c together imply a < c whenever a,b, and c are in S,ie., 
< is transitive. 

An example of such an S is any set of subsets of a set X, with < taken to 
be inclusion C. This particular partial ordering has a third property of interest, 
namely 


(iii) a < band b < a witha andb in S implya = b. 
However, the validity of (iii) has no bearing on Zorn’s Lemma below. A partial 
ordering is said to be a total ordering or simple ordering if (iii) holds and also 


(iv) any a and bin S have eithera < borb <a. 


For the sake of a result to be proved at the end of the section, let us interpolate 
one further definition: a totally ordered set is said to be well ordered if every 
nonempty subset has a least element, i.e., if each nonempty subset contains an 
element a such that a < b for all b in the subset. 

A chain in a partially ordered set S is a totally ordered subset. An upper 
bound for a chain 7 is an element u in S such that c < u for allc in T. A 
maximal element in S$ is an element m such that m < a for some a in S implies 
a <m. (If (iii) holds, we can then conclude that m = a.) 


Zorn’s Lemma. If S is a nonempty partially ordered set in which every chain 
has an upper bound, then S has a maximal element. 


REMARKS. Zorn’s Lemma will be proved below using the Axiom of Choice, 
which was stated in Section Al. It is an easy exercise to see, conversely, 
that Zorn’s Lemma implies the Axiom of Choice. It is customary with many 
mathematical writers to mention Zorn’s Lemma each time it is invoked, even 
though most writers nowadays do not ordinarily acknowledge uses of the Axiom 
of Choice. Before coming to the proof, we give an example of how Zorn’s Lemma 
is used. 


EXAMPLE. Zorn’s Lemma gives a quick proof that any real vector space V 
has a basis. In fact, let S be the set of all linearly independent subsets of V, and 
order S$ by inclusion upward as in the example above of a partial ordering. The 
set S is nonempty because @ is a linearly independent subset of V. Let T be a 
chain in S, and let wu be the union of the members of 7. If t is in T, we certainly 
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have t C u. Let us see that u is linearly independent. For u to be dependent 
would mean that there are vectors x|,...,X, inu withryx; +---+ry,x, = 0 for 
some system of real numbers not all 0. Let x; be in the member ¢; of the chain 
T. Since t) © tf or t2 © th, x1 and x2 are both in f; or both in tf. To keep the 
notation neutral, say they are both in #. Since t, C £3 or f3 C #4, all of x1, x2, x3 
are in ¢} or they are all in f3. Say they are both in ti. Continuing in this way, 


we arrive at one of the sets f1,..., fn, say t,, such that all of x1,..., x, are all 
int’. The members of t/ are linearly independent by assumption, and we obtain 
the contradiction r; = --- =r, = 0. We conclude that the chain T has an upper 


bound in S. By Zorn’s Lemma, S has a maximal element, say m. If m is not 
a basis, it fails to span. If a vector x is not in its span, it is routine to see that 
m U {x} is linearly independent and properly contains m, in contradiction to the 
maximality of m. We conclude that m is a basis. 


We now begin the proof of Zorn’s Lemma. If T is a chain in a partially ordered 
set S,then an upper bound uo for T is a least upper bound for T if uo < u for all 
upper bounds of T. If (iii) holds in S, then there can be at most one least upper 
bound for T. In fact, if wo and uj are least upper bounds, then up < ug since 
Ug is a least upper bound, and up < uo since up is a least upper bound; by (iii), 
ug = Up: 


Lemma. Let X be a nonempty partially ordered set such that (iii) holds, and 
write < for the partial ordering. Suppose that X has the additional property that 
each nonempty chain in X has a least upper bound in X. If f : X > X isa 
function such that x < f(x) for all x in X, then there exists an x9 in X with 


f (Xo) = Xo. 


PROOF. A nonempty subset E of X will be called admissible for purposes of 
this proof if f(£) C E and if the least upper bound of each nonempty chain in 
E, which exists in X by assumption, actually lies in E. By assumption, X is an 
admissible subset of X . If x is in X, then the intersection of admissible subsets of 
X containing x is admissible. Thus the intersection A, of all admissible subsets 
containing x is an admissible subset containing x. The set of all y in X with 
x < yis an admissible subset of X containing x, and it follows that x < y for all 
yin A,. 

By hypothesis, X is nonempty. Fix an element a in X, and let A = A,. The 
main step is to prove that A is a chain. Once that is established, we argue as 
follows: Since A is a chain, its least upper bound xq lies in X, and since A is an 
admissible subset, xo lies in A. By admissibility, f(A) C A. Hence f (xo) is in 
A. Since xo is an upper bound of A, f (xo) < xo. On the other hand, xo < f (xo) 
by the assumed property of f. Therefore f (xo) = xo by (ili). 
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To prove that A is a chain, consider the subset C of members x of A with the 
property that there is a nonempty chain C, in A containing a and x such that 


ea<y<~x forall yinC,, 

e f(Cx — {x}) € Cy, and 

e the least upper bound of any nonempty subchain of C, is in Cy. 
The element a is in C because we can take C, = {a}. If x is in C, so that C, 
exists, let us use the bulleted properties to see that 


A= A, UC,. (*) 


We have A > C;, by definition; also AN A, is an admissible set containing x and 
hence containing A,.,and thus A > A,. Therefore A > A, UC,. For the reverse 
inclusion it is enough to prove that A, UC, is an admissible subset of X containing 
a. The element a is in C,, and thus is in A, UC,.. For the admissibility we have to 
show that f(A, UC,) C A,UC, and that the least upper bound of any nonempty 
chain in A, UC, lies in Ay UC,. Since x lies in A,, Ay UC, = Ay U (Cy — {x}) 
and f(A, UCy) = f(Ax) U f (Cx — {x}) © Ay U Cy, the inclusion following 
from the admissibility of A, and the second bulleted property of Cy. 

To complete the proof of (*), take a nonempty chain in A, U C,, and let u be 
its least upper bound in X; it is enough to show that u is in A, UC,. The element 
u is necessarily in A since A is admissible. Observe that 


y<x and x<z whenever y is in C, and z is in A,. (2) 


If the chain has at least one member in A,., then (>) implies that x < u, and 
hence the set of members of the chain that lie in A, forms a nonempty chain in 
A, with least upper bound uw. Since A, is admissible, u is in A,. Otherwise the 
chain has all its members in C,;, and then u is in C, by the third bulleted property 
of Cy. 

This completes the proof of (+). Although we do not need the fact, let us 
observe that combining (:) and (*) yields Ay 1 Cy = {x} whenever C, exists. 
Thus C, = (A — Ax) U {x} if Cy exists. In particular the defining properties of 
C,, determine C,, completely. 

Recalling that C is the subset of members of A such that C, exists, we shall 
show that C is an admissible set containing a. If we can do so, then it follows 
that C D> Ag = Aandhence C = A. This fact, in combination with (x) and (*«), 
proves that A is a chain: if x and y are in A, («) shows that y is in A, or C,, and 
(«) shows that x < y in the first case and y < x in the second case. 

Thus the proof will be complete if we show that C is admissible and contains 
a. We already observed that it contains a. We need to see that f(C) C C 
and that the least upper bound of any nonempty chain of C lies in C. For the 
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first of these conclusions, let us see that if x is in C, then C(,) exists and can 
be taken to be C, U{f(x)}. Property («) proves that C, U {f(x)} satisfies the 
first bulleted property of Cy(,), and the second bulleted property follows from 
that same property of C, and the fact that x < f(x). Any nonempty chain in 
C, U{f (x)} either lies in Cy and has its least upper bound in C, or else contains 
J (x) as amember and then has f (x) as its least upper bound. Thus C,. U {f (x)} 
satisfies the third bulleted property of C¢(,) and can be taken to be Cy). Hence 
f(x) isinC,and f(C) CC. 

Finally take any nonempty chain {x,,} in C, and let uv be its least upper bound, 
which is necessarily in A. Form the set (U, Cx.) U {u}. It is immediate that 
this set satisfies the three bulleted properties of C,, and therefore can be taken to 
be C,,. Hence u is in C, and C contains the least upper bound of any nonempty 
chain in C. Then C is indeed an admissible set containing a. 


PROOF OF ZORN’S LEMMA. Let S be a partially ordered set, with partial 
ordering <, in which every chain has an upper bound. Let X be the partially 
ordered system, ordered by inclusion upward C, of nonempty chains? in S. The 
partially ordered system X, being given by ordinary inclusion, satisfies property 
(iii). A nonempty chain C in X is anested system of chains cy of S,and UL), Cw is 
a chain in S$ that is a least upper bound for C. The lemma is therefore applicable 
to any function f : X — X such thatc C f(c) forall c in X. We use the lemma 
to produce a maximal chain in X. 

Arguing by contradiction, suppose that no chain within S is maximal under 
inclusion. For each nonempty chain c within S, let f(c) be a chain with c C f(c) 
and c # f(c). (This choice of f(c) for each c is where we use the Axiom of 
Choice.) The result is a function f : X — X of the required kind, the lemma 
says that f(c) = c for some c in X, and we arrive at a contradiction. We conclude 
that there is some maximal chain co within S. 

By assumption in Zorn’s lemma, every nonempty chain within S has an upper 
bound. Let uo be an upper bound for the maximal chain co. If u is a member of $ 
with uo < u, then co U {wu} is a chain and maximality implies that co U {u} = co. 
Therefore u is in co, and u < up. This is the condition that up is a maximal 
element of S. 


Corollary (Zermelo’s well-ordering theorem). Every set has a well ordering. 


PROOF. Let S be a set, and let € be the family of all pairs (E, <¢) such that E 
is a subset of S and <z is a well-ordering of E. The family € is nonempty since 


>Here a chain is simply a certain kind of subset of S, and no element of S can occur more than 
once in it even if (iii) fails for the partial ordering. Thus if S = {x, y} with x < y and y < x, then 
{x, y} isin X and in fact is maximal in X. 
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(@, @) is a member of it. We partially order € by a notion of “inclusion as an 
initial segment,” saying that (E, <¢) < (F, <r) if 

GQ) ECF, 

Gi) a and bin E witha <; bimpliesa <- b, 

(iii) a in E and Db in F but not E together imply a <; b. 

In preparation for applying Zorn’s Lemma, let C = {(Ey, <q)} be a chain in €, 
with the a’s running through some set /. Define Ey = LU, Ea and define <p as 
follows: If e; and e2 are in Eo, let e; be in Ey, with a, in J, and let e2 be in Ey, 
with a in J. Since C is a chain, we may assume without loss of generality that 
(Eo, <a) < (Ear, <a,), $0 that Ey, C Ey, in particular. Then e; and e are both 
in Ey, and we define e; <o €2 if e) <q, €2, or e2 <o e1 if ex <q, e1. Because of 
(i) and (ii) above, the result is well defined independently of the choice of a and 
a2. Similar reasoning shows that <p is a total ordering of Eo. If we can prove 
that <o is a well ordering, then (Eo, <o) is evidently an upper bound in € for the 
chain C, and Zorn’s Lemma is applicable. 

Now suppose that F is a nonempty subset of Eo. Pick an element of F, and 
let Ey, be a set in the chain that contains it. Since (Eg,, <q,) is well ordered and 
FOE, isnonempty, FM £,, contains a least element fo relative to <,,. We show 
that fo <o f forall f in F. In fact, if f is given, there are two possibilities. One 
is that f is in E,,; in this case, the consistency of <9 with <,, forces fo <o f. 
The other is that f is not in E,, but is in some Ey,. Since C is a chain and 
Ea, © Ea fails, we must have (Ey, <a) < (Ea;,<a,). Then f is in Ey, but 
not E,,, and property (iii) above says that fo <a, f. By the consistency of the 
orderings, fo <o f. Hence fo is a least element in F’, and Eo is well ordered. 

Application of Zorn’s Lemma produces a maximal element (E, <,) of €. If 
E were a proper subset of S, we could adjoin to E a member s of S not in E and 
define every element e of E to be < s. The result would contradict maximality. 
Therefore E = S, and S has been well ordered. 
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Two sets A and B are said to have the same cardinality, written card A = card B, 
if there exists a one-one function from A onto B. On any set A of sets, “having the 
same cardinality” is plainly an equivalence relation and therefore partitions A into 
disjoint equivalence classes, the sets in each class having the same cardinality. The 
question of what constitutes cardinality (or a “cardinal number”) in its own right 
is one that is addressed in set theory but that we do not need to address carefully 
here; the idea is that each equivalence class under “having the same cardinality” 
has a distinguished representative, and the cardinal number is defined to be that 
representative. We write card A for the cardinal number of a set A. 
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Having addressed equality, we now introduce a partial ordering, saying that 
card A < card B if there is aone-one function from A into B. The first result below 
is that card A < card B and card B < card A together imply card A = card B. 


Proposition (Schroeder—Bernstein Theorem). If A and B are sets such that 
there exist one-one functions f : A > Band g: B — A, then A and B have 
the same cardinality. 


PROOF. Define the function g~! : imageg — A by g~!(g(a)) = a; this 


definition makes sense since g is one-one. Write (g o f)) for the composition 
of go f with itself n times, and define (f o g)” similarly. Define subsets A, 
and Aj, of A and subsets B, and BY, for n > 0 by 


An = image((g 0 f)) — image((g 0 f)” 0 g), 

Al, = image((g of) 0 g) — image((g 0 f)"*), 
B, 
B,, 


= image((f 0 g)”) —image((f 0g)” o f), 
= image((f og)” o f) — image((f 0 g)t?), 


and let 


Ac =f) image((go f)) and Bye = () image((f 0 8). 
n=0 n=0 


Then we have 


oe) oo ioe) oo 
A=AcoUUA,UU A, and B=B OULU BULL B, 
n=0 n=0 n=0 n=0 


with both unions disjoint. 

Let us prove that f carries A, one-one onto B). If a is in Ay, thena = 
(g o f)™ (x) for some x € A and a is not of the form (g o f)(g(y)) with 
y € B. Applying f, we obtain f (a) = (f 0 (go f)™)() = (f og) ™(F()), 
so that f(a) is in the image of ((f o g)™ o f). Meanwhile, if f(a) is in the 
image of (f o g)*, then f(a) = (f og)*(y) = f(g o f)(g(y))) for 
some y € B. Since f is one-one, we can cancel the f on the outside and obtain 
a = (go f)™(g(y)), in contradiction to the fact that a is in A,. Thus f carries 
An into BY, and it is certainly one-one. To see that f(A,) contains all of BY, let 
b € B’ be given. Then b = (f 0 g)(f(x)) for some x € A and D is not of the 
form (f o g)"t)(y) with y € B. Hence b = f((g 0 f)™(x)), ie., b = f(a) 
with a = (go f) (x). If this element a were in the image of (go f)™ og, 
we could write a = (go f)(g(y)) for some y € B, and then we would have 
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b= fla) = f((go f)™(g(y))) = (f 0 g)* (y), contradiction. Thus a is in 
A, and f carries A, one-one onto By. 

Similarly g carries B, one-one onto A/,. Since A’, is in the image of g, we can 
apply g~! to it and see that g~! carries A’, one-one onto By. 

The same kind of reasoning as above shows that f carries A. one-one onto 
Boo. In summary, f carries each A, one-one onto B/ and carries Ajo one-one 
onto Boo, while g~! carries each Aj, one-one onto B,,. Then the function 


-1 


h f on Ag and each Ay, 
le on each A’, 


carries A one-one onto B. 


Next we show that any two sets A and B have comparable cardinalities in the 
sense that either card A < card B or card B < card A. 


Proposition. If A and B are two sets, then either there is a one-one function 
from A into B or there is a one-one function from B into A. 


PROOF. Consider the set S$ of all one-one functions f : E > Bwith E C A, 
the empty function with E = ©@ being one such. Each such function is a certain 
subset of A x B. If we order S by inclusion upward, then the union of the members 
of any chain is an upper bound for the chain. By Zorn’s Lemma let G : Ey > B 
be a maximal one-one function of this kind, and let Fo be the image of G. If 
Eo = A, then G is a one-one function from A into B. If Fo = B, then G7! 
is a one-one function from B into A. If neither of these things happens, then 
there exist xo € A= Ey and yo in B — Fo, and the function G equal to G on 
Eo and having G(xo) = yo extends G and is still one-one; thus it contradicts the 
maximality of G. 


Cantor’s proof that there exist uncountable sets, done with a diagonal argument, 
in fact showed how to start from any set A and construct a set with strictly larger 
cardinality. 


Proposition (Cantor). If A is a set and 24 denotes the set of all subsets of A, 
then card 24 is strictly larger than card A. 


PROOF. The map A +> {A} is a one-one function from A into 24. If we are 
given a one-one function F : A > 24, let E be the set of all x in A such that x is 
not in F(x). If F(xo) = E, then xo € E implies x9 ¢ F (xo) = E, while xo ¢ E 
implies x € F(x9) = E. We have a contradiction in any case, and hence E is not 
in the image of F. We conclude that F cannot be onto 24. 


HINTS FOR SOLUTIONS OF PROBLEMS 


Chapter I 


1. Let E be a nonempty set that is bounded above. Start with a member s; of E. 
Choose if possible an s2 in EF with sz—s; > 1. Continue with s3—s2 > 1,s4a—s3 > 1, 
etc., until this is no longer possible; the existence of an upper bound forces the process 
to stop at some stage. Suppose that s; has been constructed at this stage. Define s;+.y 
inductively for n > 1 to be a member of E with sp4n — Setn—-1 => 27” if possible; 
otherwise define 5,4, = S%4,—1- Then {s,,} is bounded and monotone increasing. To 
complete the problem, one has only to show that lim,, s, is the least upper bound of 
E. 


2. Show that x; > /a and that fa < Xn41 < x, forn > 1. Then lim, x, = c 
exists by Corollary 1.6, and c must satisfy c = 5 (c? +a)/c. 


3. Write out a few cases and guess that the pattern is a2, = 5(1 = 2--D) for 
n> 1 and a2,41 = 1 —2~” forn > 0. Prove each of these statements by induction. 
Since a2, > 5 and a2,41 — 1| and since these two subsequences use all the terms of 
the sequence, the only subsequential limits of {a;} are 5 and 1. Therefore lim sup a, = 
1 and lim inf a, = 5. 


4. The argument without paying attention to finiteness is thata, +b, < sup,., 4+ 
SUP,>, b, for n > k, then that SUP,» x (Gr +b.) < sup, >, dr + SUP, b, for all r, and 
then that the limit of the sum is the sum of the limits. 


5. Only (i) converges uniformly, the reason being that 0 < x”/n < 1/n and 
that lim1/n = 0. There is uniform convergence in (i) on [0, | — €] because 0 < 
x” < (1 — €)"”, and there is uniform convergence in (iii) on [0, 1 — €] because the 
Weierstrass M test applies with |x«|/k < (1 — ©)‘ and 1 - e) < +00. 


6. The uniform convergence of beer an (x) follows from Corollary 1.18, and the 
pointwise convergence of )->° 9 |dn(x)| follows because (1 — x) )0P29.x” = 1 for 
0 <x < | and because every a, (x) is 0 for x = 1. The convergence of aa lan(x)| 
cannot be uniform because the sum is discontinuous and Theorem 1.21 says that it 
would have to be continuous. 


7. Put gn = f — fn, so that g, is continuous and decreases pointwise to the 0 
function. Let x = x, be a point where g,(x) is a maximum, and let M, = gn(Xp). 
We are to prove that M, tends to 0. Suppose it does not. If k > n, then M; = 
BK(XE) = Be(Xn) = Bn%n) = M,. So M,, decreases to some M > 0. Passing to a 
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subsequence if necessary, we may assume by the Bolzano—Weierstrass Theorem that 
lim, X, = x’. Fork > n, we have gx(%_) = 8n(Xn) = M, = M. Letting n tend to 
infinity gives g,(x’) > M since g, is continuous. This inequality for all k contradicts 
the assumption that lim, g;(x’) = 0. 


8. The idea is to prove the four inequalities 


2m 2m+1 
~ (—1)'x7*+! (2k + 1)! > sinx, a (—1)'x?*/(2k)! < cos x, 
k=0 k=0 
2m+1 2m+2 
> (=1)*x**+1 (2k + 1)! < sinx, > (—1)*x?*/(2k)! > cos x 
k=0 k=0 


together by an induction. They are to be proved in order for m = 0, then in order for 
m = 1, and so on. In each case of the inductive step, the left side minus the right 
side is 0 at x = 0 and has derivative equal to the previous left side minus right side. 
The Mean Value Theorem says that each left side minus right side at x > O equals 
the product of x and the left side minus right side at € withO < € < x. Substituting 
the previously proved inequality at € then gives the result. In other words, everything 
comes down to proving the first inequality, namely x > sinx for x > 0. Arguing in 
the same style, we have x —sinx = 1—cos& withO < € < x. Soatleastx—sinx > 0. 
For 0 < x < 2, we actually obtain x — sinx > 0. Since L(x —sinx) > 0, we have 
x —sinx > 2 —sinz fora < x. Thus x — sinx > 0 forall x > 0. 


9. The thing to prove is that the remainder term 4 Jo @& — 0)" F0FY (O dt tends to 
0 for each x as n tends to oo. If x > 0, the absolute value is < (n!)~! Ie (x—t)" dt = 
x"*+1/(n + 1)!, which tends to 0 for any fixed x. If x < 0, one argues in a similar 
fashion. 


10. By a diagonal process we can find a subsequence {F,,, } convergent for each 
rational x. Let F be the resulting limit function, carrying the rationals in [—1, 1] into 
[0, 1]. Ifr and s are rationals with r < s, then F(r) = lim, Fy, (7) < lim, Fy, (s) = 
F(s). Thus F is nondecreasing on the rationals. For each real x with —1 < x < 1, 
define F(x~) to be the limit of F(r) with r rational as r increases to 1, and define 
F (x*) to be the limit of F() withr rational asr decreases to 1. Then F(x~) < F(xt) 
for each x, and F(xt) < F(y~) if x < y. Foreach N > 0, it follows that there can 
be only finitely many x’s for which F (x*+) — F(x~) => 1/N, and hence there can be 
at most countably many x’s for which F(x~) # F(x*). Let this exceptional set be 
denoted by C. For x not in C, define F(x) = F(xt) = F(x"). 

For x not in C, let us show that lim; F,, (x) exists and equals F(x). Ifr < x is 
rational, we have F'(r) = lim inf, F,,, (7) < lim inf, F;, (x); taking the supremum over 
r gives F(x) = F(x) < liminf; F,,, (x). Arguing similarly with s rational and x < 
s, we have lim sup, Fn,(x) < limsup, Fn, (s) = F(s), and hence lim sup, Fn,(x) < 
F(x*) = F(x). Combining these two conclusions, we see that liminfy Fy, (x) = 
lim sup; Fy, (x) and that the common value of these limits is F(x). 
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Thus {F;,,(x)} converges except possibly for x in C. At each point of C, the 
sequence is bounded. Since C is countable, another use of a diagonal process produces 
asubsequence of F’,, that converges at every point of C, hence at every point of [—1, 1]. 


11. If |x| > 1/limsup */Ja,|, then */Ja,) > 1/|x| for infinitely many n. Thus 
|a,x"| > 1 for infinitely many n, and the terms of the series do not tend to 0. 
Hence the series cannot converge. In the reverse direction we want to see that the 
inequality |x| < 1/limsup </|a,| implies convergence of the series. We rewrite 
this as lim sup */Ja,| < 1/|x|. Choose a number r with lim sup */Ja,| <r < 1/|x|. 
Then */|a,| <r forall sufficiently large n, */Ja,| |x| < r|x| < 1 forall n sufficiently 
large, and |a,x"| < (r|x|)” for all n sufficiently large. Thus }* |a,x”| is dominated 
term-by-term (from some point on) by the geometric series }° s”, where s = r|x|. 
Since s < 1, the geometric series converges, and hence so does )* |ayx"|. 

12. 1/( — x)? = Ve o(nt 1x", log(l — x) = — 2 x" /n, 1/0 +x) = 
yy (—1)"x2", and arctanx = )°°° 9(—1)"x7"+!/(2n + 1). All these series have 
radius of convergence 1. 


13. The proof of existence of arccosx uses the proposition in Section A3 of 
the appendix. The result of the calculation of the derivative is that 4 arccosx = 
—1/V1 — x? for |x| < 1. Then arcsin x + arccos x has derivative 0 on (0, 2/2) and 
hence is constant. The constant is evaluated by putting x = 0, and the result is that 
arcsin x + arccos x = 7/2 0n (0, 7/2). 

14. The uniform version of Abel’s Theorem is this: Let {a,(x)}n>o0 be a sequence 
of complex-valued functions with )°~~ 4 an (x) converging uniformly to the limit s (x). 
Then lim,;1 5°59 @n(x)r" = s(x) uniformly in x. The proof is just a matter of seeing 
that the estimates in the proof of Theorem 1.48 can be made uniform in x under the 
stated assumptions. The result about Cesaro sums is handled similarly. 


15. Write cosn? = 5 (ein? + e7!”°) and sinn@ = x (ein? —e”) Then 
‘ iG ods op elNEDE ye it+9 
in —In = 
ase ';cosnO = 5 yj e +5 5 ote = 2 qs Bae) 1—e-10 


e 
Each numerator is bounded by 2, and each denominator gets close to 0 only as 6 tends 
to a multiple of 27. This proves the estimate for the cosines, and the estimate for the 
sines works in the same way. 


17. For (a), the relevant result is that wien all : are 0, a 1 |Pn |> equals 
1 
=| [f(x)[? dx. Here aay is (4/n)? > a pr af Lf)? dx 


Qn 1 4 
is just — = 2. Hence 3 TT Ga ae 
ay (@n - 1) 8 


18. We have F(x) f(y) = fg FOFQ)dt = fp f+ yat =f? fdr = 
F(xt+y)—F(Q). If F(x) 4 0 for some x, we can divide and use the Fundamental The- 
orem of Calculus to see that f(y) has a continuous derivative everywhere. (If F(x) = 
0 for all x, then differentiation gives f(x) = 0 for all x.) Differentiating the original 
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identity in x gives f’(x) f(y) = f’@ + y). When x = 0, we obtain f’(0) f(y) = 
f(y). Then F (fe FO”) = fe FO” + FCF’ Oe FO) = 0, and 
hence f(y)e~/” is constant. Thus f(y) = ae’, In the original identity 
f@fO) = f+ y), if we put x = 0 and choose y such that f(y) 4 0, then we 
see that f(0) = 1. Hence f(y) = ef” if f is not identically 0. 

19. We may assume that f is not identically 0. As in Problem 18, we have 
f (0) = 1. By continuity of f, choose xo such that | f(x) — 1| < a when |x| < |xo]. 
Then Re f(xo) > 0, and we can choose a unique c with | Im(cxo)| < 2/2 such 
that e*° = f(xo). The equation for f shows that f (4x0) = f(xo), and hence 
f (5x0) equals e*/? or —e0/?, From | f (5x0) -I|< i we have Re f (50) >0. 
Since |Im(cxo/2)| < 2/2, e°*°/* is the choice of square root of e°° with positive 
real part, and we conclude that f (5x0) = e“/?, Tterating this argument, we obtain 
f(2"x0) = e "* for alln > 0. The equation for f shows that f (kx) = Fy for 
all integers k > 0, and thus f (¢xo) = e“?*° for every rational g of the form k/2” with 
k an integer > 0. From f(x) f(—x) = f(0) = 1, we have f(x!) = f(x)~!, and 
thus f(qxo) = e“” for every rational number of the form k/2” with k any integer. 
Using continuity and passing to the limit, we obtain f(r) = e for all real r. 


21. This uses the discussion at the end of Section A2 of the appendix. For x 4 0, 
we compute that g/(x) = (R(x) /S(x)ye7t/* for polynomials R and S with S not the 
0 polynomial. Then lim,_,o g’(x) = 0 by Problem 20, and the appendix shows that 
g’(0) exists and equals 0. 

22. Use Problem 21 and induction. 

23. Since {s,} is convergent, it is bounded. Say |s,| < K forall. Lete > 0 
be given, and choose N such that n > N implies |s, — s| < €/2. Write t, —s = 
ej MnjSj — 8 = 20; Mnj(sj — 5) by (i). A second application of (i) gives 


N oo 
lt, —s| < Yo My j(\3;| +s) + Ss My j\s; — s| 


j=0 jJ=N+1 


N oo N 
<2K 3° Miyj+ D> Myje/2 <2K ~My + €/2. 
j=0 j=N+1 j=0 


Since N is fixed, (ii) shows that 2K wee, M,; < €/2 for n sufficiently large. For 
those n, |t, —s| <€. 

24. For Cesaro summability the i th row, for i > 1, has its first i entries equal to 
1/i and its remaining entries equal to 0. For Abel summability the row going with r; 
has j" entry (1 —r;)(r;)/ for j > 0. 

25. Certainly Mj; = 0 for alli and j. The power series in Problem 12a shows that 
pa M;; = 1 for alli, and (ii) holds because lim, 1(j+1)r/(1—r)? = (j+1)-1-:0 =0. 
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26. Check that M as in the previous problem transforms the Cesaro sums into the 
Abel sums, and apply Problem 23. 


27. This is handled by the same kind of computation as with the Fejér kernel. 


28. The formula for P, (8) comes from summing the two geometric series forn > 0 
and n < 0 and then adding the results. Properties (i) and (iii) are then immediate 
by inspection. For property (ii) we use the series expansion of P, (9). Theorem 1.31 
allows the integration to be done term by term, and the result follows. 


29. This is proved in the same way as Fejér’s Theorem (Theorem 1.59). 


30. Corollary 1.38 shows that f/(x) = Yo cngnx" | and that fi/(x) = 
waa Cn en(n — 1)x"~? for |x| < R. The point is to show that { f/(x)} is uniformly 
bounded and uniformly equicontinuous for |x| < r, and then Ascoli’s Theorem 
produces the required subsequence. For proving the equicontinuity, it is enough to 
prove that { f/’(x)} is uniformly bounded for |x| <r. 

Fix r < R,andchooser; withr <r, < R. Since lim f,(x) = f(x) uniformly for 
|x| <7, there is an M such that | f;(71)| < M for all k. Thus | > Cnt} | < M for 
allk. Since cy, > 0 foralln andk,cn, < Mr," foralln andk. Sincer < r;,choose 
N such that n > N implies n(r/r,)"~! < 1 and n(n — 1)(r/r1)"~? < 1 forn > N. 
Since cy, > 0 forall n and k, cp. gn|x|"—! < cngnr"! < (crart')ar/ri)"!) < 
eiere for n > N and |x| <r. Summing on n > N and taking Corollary 1.38 into 
account, we see that 


N-1 N-1 
“4 -1 zy =i 
AO) — Do nena” |] sry (fe — Denar?) Srp tf) sry'M 
n=0 n=0 


for |x| <r. Thus |x| <r implies that | f/(x)| is 


N-1 N-1 
< r, 1M + ss Np ¢ |x|" < r, 1M + >y. Rete < r, 1M + N(N — 1)Mr;', 
n=0 n=0 
and { f/(x)} is uniformly bounded for |x| <r. 
A similar argument with f;’ ‘Shows that 


LA) — Yn = Dena”? | <y2M, 
0 


and we find similarly that { f;’ (x)} is uniformly bounded for |x| < 7. This completes 
the proof. 


31. Theorem 1.23 shows that the limit of the subsequence of first derivatives is 
the first derivative of the limit, the limit being differentiable. In other words, f is 
differentiable for |x| < r,and the subsequence converges to f’(x) there. Sincer < R 
is arbitrary, f is differentiable for |x| < R. Now we can induct, replacing f and the 
sequence f; in Problem 30 by f’ and a subsequence of f; on a smaller disk, then 
passing to f”, and so on. The result is that f is infinitely differentiable for |x| < R. 


32. This is proved in the same way as in Problem 9. 
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1 . fore) 1 
33. lez" | rN" if ln = rand > pr =r" (dry. Thus yz + 
weet +.---| tends uniformly to 0 for |z| <r. Since t + exp(t) is continuous at 


t = 0, the required convergence follows. 

34. Corollary 1.38 shows from the behavior for z real that all c, are 0. 

35. Write 

exp (c+ fa? + ho? +--+) = (TE expt) exp (He" + whe! +---) 
Problem 33 shows that the left side is the uniform limit of Ape exp(tz*) for |z| <r 
ifr < 1. Each factor of the finite product is given by a convergent power series 
with nonnegative coefficients, and Theorem 1.40 shows that the finite product is 
given by a convergent power series with nonnegative coefficients. By Problem 32, 
exp (z + 5e + 52 +-- -) is given by a convergent power series for |z| < 1. Hence 
exp (z + 52° + ae +-:- -) — 1/(1 — z) is given by a convergent power series for 
|z| < 1. For z = x real with |x| < 1, the series expansion of Problem 12b shows that 
our expression is exp ( log x)) 1/(1 — x) = 0. Thus our power series sums 
to 0 on the real axis. By Problem 34, it sums to 0 everywhere. 
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1. Let us compare d(x, y) with d(x, z) + d(z, y). If j contributes to d(x, y), then 
x; # y;. Hence x; ¢ z; or z; # y;. Thus j contributes to at least one of d(x, z) and 
d(z, y). In other words, the contribution of j to d(x, y) is < the contribution of j to 
d(x,z)+d(z, y). Summing on j gives the desired result. 

2. Let (X,d) be the given separable metric space, define EF to be the subset of 
members x of X such that every neighborhood of x is uncountable, and let F' be the 
complement of FE. If x is in F, we can associate to x some open neighborhood N,,. 
containing at most countably many elements, and N, is entirely contained in F. As 
x varies in F,, the sets N, form an open cover of F'. By Proposition 2.32b, some 
subcollection of the N, that is at most countable covers F’. The union of these sets is 
open and is at most countable, and it equals F. 

3. Let f(x) = 1/x for0 < x < 1,andlet f() =0. 

4. Suppose that x is in U. Since A is dense, the set AM B(1/n; x) is nonempty 
for each n > 1. Let x, be a member of it. Since U is open, B(1/n; x) is contained 
in U ifn is > N fora suitable N. Thus x, isin ANU forn > N and converges to 
x. By Proposition 2.22b, either x, = x infinitely often, in which case x isin ANU, 
or x is a limit point of AN U. In either case, U Cc (AN uy)", 


5. For (a), the sets E,, are compact by the Heine—Borel Theorem. Then each FE, —U 
is compact. Their intersection is (\7°_) (EnQU*) = ((\po) En) US C UNUS = ©. 
By Proposition 2.35 the system {£,, —U} does not have the finite-intersection property. 
Thus Gis (E, — U) = @ for some N. Since FE; D E> D ---, we find that 
Ey —U =@. Therefore Ey CU. 

For (b), let U be empty, and let E, = QN [V2, V2 + 1/n]. 
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6. In both parts of the problem, let the metrics be dy, dy, dz. For (a), use continuity 
of F to choose for each (xo, y) some 51,, > O and 52, > O such that the two inequal- 
ities dy(x, xo) < 61,y and dy(y’, y) < 62,y together imply dz(F (x, y’), F(x, y)) < 
€/2. As y varies, the open balls B(52,,; y) cover Y. Since Y is compact, a fi- 
nite number of them suffice to cover Y, say B(d2,y,; y1),..-, B(S2,y,3 yn). Put 
6; = min{d,,,...,5),y,}. Suppose now that dy(x, xo) < 6; and that y’ is in 
Y. Then y’ is in some B(S2,y,5 yj). Hence we have dx(x,x9) < 6) < b1.y, and 
d(y’, yj) < 62,y,, and we therefore obtain dz(F (x, y’), F (x0, yj)) < €/2. Since also 
dx (xo, x0) =O and d(y’, yj) < 52,y,, we obtain also dz(F (xo, y’), F (xo, yj)) < €/2. 
Combining these two results gives dz(F (x, y’), F (xo, y’)) < €. 

For (b), consider dz(F (x, y), F (xo, yo)), and let € > 0 be given. By uniform con- 
vergence, choose 6; > 0 such that dx (x, x9) < 6; implies dz(F (x, y), F(xo, y)) < 
€/2 for all y. Proposition 2.21 gives us continuity of F(xo, -), and thus there 
exists d2 > O such that dy(y, yo) < 62 implies dz(F (x0, y), F(xo, yo)) < €/2. 
Then dx (x, x9) < 6; and dy(y, yo) < 52 together imply dz(F (x, y), F(xo, yo)) < 
dz (F(x, y), F (xo, y)) + dz(F (to, ¥), F (x0, yo) < €/2 + €/2 =€. 

7. Let f : (0,1) > R be defined by f(x) = 1/x. Then the Cauchy sequence 
{1/n} is carried to a sequence that is not Cauchy in R. 


8. Define inductively f to be the identity and f = f o f-) fork > 0. 
For existence we see inductively that d(f (x), f(y)) < r*d(x, y) for all x and 
y. If n > m and if x is arbitrary but fixed, we then have d(f (x), f™(x)) < 

fom UF 8+) (x), FR) < Deon AS), *) Sr™d(f @), x)/(—r). Hence 
the sequence { f (x)} is Cauchy. Let x’ be its limit. Since d(f (f™(x)), f™(x)) = 
d(f@t(x), f™(x)) < r"d(f (x), x)/( — r) and since d and f are continuous, 
d(f (x’), x’) < lim sup, r"d(f (x), x)/( —r) = 0. Thus f(x’) = x’. 

For uniqueness, let f(x”) = x” also. Then d(x", x’) = d(f (x”), f(x')) since f 
fixes x’ and x", and d( f(x”), f(x’)) < rd(x”, x’) by the contraction property. Then 
(1 —r)d(x", x’) < 0 and we conclude that d(x”, x’) = 0. Thus x” = x’. 

9. Ifno point is isolated, each one-point set is closed nowhere dense. The countable 
union of these sets is the whole space, in contradiction to the Baire Category Theorem. 
An alternative argument is to appeal to Problem 2. 


10. The set is closed and bounded, hence compact, and it is pathwise connected, 
hence connected. It is not, however, locally connected. Take, for example, the point 
p = [c, 1/2] in X, where c is in C. The open ball of radius 1/4 around p has the 
property that no open subneighborhood of p is connected. 


11. Fix xo in X, and let U be the set of all points in X that can be connected to 
xo by paths. The set U is nonempty, and we prove that it is open and closed. Being 
connected, it must then be all of X. Itis open because the local pathwise connectedness 
means that any x in U can be connected to every point in some neighborhood of x 
by a path; hence U contains a neighborhood of each of its points and is open. To 
see that U is closed, let y be a limit point of U. If V is a pathwise connected open 
neighborhood of y, the set U M V is nonempty because y is a limit point of U. Let z 
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bein UMV. Then xo can be connected to z by a path because of the defining property 
of U,, and z can be connected to y by a path because V is pathwise connected. Hence 
xq can be connected to y by a path, and y is in U. 

12. Any open subset of R” is locally pathwise connected. So the desired conclusion 
follows from the previous problem. 

13. Let the open set be U. For each x in U, let U, be the union of all connected 
subsets of U containing x. It was shown in Section 8 that this is connected. For x and 
yin U, either U, = U, or U, N Uy = @ for the same reason. Then U is the disjoint 
union of its subsets U,, which are connected. These are intervals, being connected, 
and they must be open in order not to be contained in larger connected subsets of U. 

14. Same as for Proposition 2.21. 

15. Suppose {f,} is totally bounded. Let € > 0 be given. Find, by total bound- 
edness, real numbers ¢;,..., ¢, such that for any ¢, there is an index j = j(t) with 
lft — fi; ll < €. Put L/2 = max{|ti|,..., |ta|}. If we are given an interval of length 
> L, take ¢ to be its center, so that the interval contains [tf — L/2,t + L/2]. Choose 
j by total boundedness with || f; — fi,} < ¢. Then || f+, — foll < «. Sot —¢; is an 
€ almost period, and this lies in [t — L/2, t + L/2]. Thus the Bohr condition holds. 

Conversely suppose that the Bohr condition holds and f is uniformly continuous. 
Let € > 0 be given, and find L as in the Bohr condition for €/2 almost periods. Also, 
find some 6 for uniform continuity of f and the number €/2. Choose tj, ..., tf, in 
IT = [-L/2, L/2] such that any point in J is within 6 of one of f1, ..., t,. Let us see 
that the open balls of radius € around f;,,..., f;, together cover the set { f;} of all 
translates. If ¢ is given, find an L/2 almost period t — s in [t — L/2,t + L/2]. Here 
|s| < L/2, so that || fi_s — foll < €/2 and || ft — fsll < €/2. Since |s — t;| < 6 for 
some j, we have || fs — fi, || < ¢/2 by uniform continuity. Thus || f; — fi, || < €. 

16. Let Ty be the closure of the set of translates of f. This is complete by 
Problem 14. Theorem 2.36 shows that 7; is compact if and only if every sequence in it 
has a convergent subsequence, and this is the definition of Bochner almost periodicity. 
Theorem 1.46 shows that Ty is compact if and only if it is totally bounded, and this 
is equivalent to Bohr almost periodicity by Problem 15. 

17. This is easier with the Bochner definition. For an example of closure under 
the various operations, consider closure under multiplication. Suppose that f and g 
are given and that we want a convergent subsequence from the sequence of translates 
(fg),,. First choose a subsequence of {f,} such that those translates of f converge 
uniformly, and then choose a subsequence of that such that the translates of g converge 
uniformly. These sequences of translates of f and g will be uniformly bounded, and 
then it follows that the sequence of products converges uniformly. 

For closure under uniform limits, we argue similarly with translates of each of the 
functions {f,} when lim f, = f uniformly. A Cantor diagonal process is used to 
extract the sequence of translates to use for f. 

18. If € > 0 is given, let U,, be the set where | f(x) — f,(x)| < €. This is open 
by the assumed continuity, and (J, U, = X by the assumed convergence. Since 
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X is compact, some finite collection of U,,’s suffices. Since the /,,’s are pointwise 
increasing with n, the U,,’s are increasing, and thus X = Uy for some NV. For that 
N,\|f(x) — fu()| < €. Then | f(x) — f,(x)| < € forn => N since the f,’s are 
pointwise increasing. 

19. If0 < P,(x) < Jx < 1, then x > P, (x)? and the recursion shows that 
Prii(t) = Pa(r). Also, Pasi(x) = Pa(x) + 300% + Pals) (V% — Palz)) < 
P,(x) + 51 + 1)(V% — Pyle) = VX. 

20. By Problem 19, P,,(x) increases pointwise to some f(x). Passing to the 
limit in the recursion gives f(x) = f(x) + 5(« — f(x)?), and thus f(x)? = x 
and f(x) = ./x. Since ./x is continuous and [0, 1] is compact, Dini’s Theorem 
(Problem 18) shows that the convergence is uniform. 


21. If x and y are given, then we are given three relevant functions in A, possibly 
not all distinct. They are h; with hj(x) 4 hi(y), ho with hoax) ¢ 0, and hz with 
h3(y) £ 0. If hy (x) or h1(y) is 0, we can add a multiple of hz or h3 to hy; to obtain 
an ha with ha(x) 4 ha(y), ha(x) 4 0, and ha(y) ¥ 0. The restrictions of h4 and hi 
to the two-element set {x, y} are linearly independent and therefore form a basis for 
the 2-dimensional space of restrictions. Hence some linear combination of h4 and hi 
equals the given f at x and y. 


22. Let f be in Cp(S) with f(so) = 0. Since B* = Cp(S), there exists 
a sequence {g,} in B with limg, = f uniformly. Then limg,(so) = f(so) = 
0 in particular. Put f,(s) = gn(s) — gn(so). Then f,(so) = 0. The inequality 
| fn(s) — F(S)| = [8n(s) — FS) — 8n(0)1 S 18n(8) — (8) + 18n(s0)| shows that { fn} 
converges uniformly to f. The members of A are the members of 6 that vanish at 
so. The functions f,, have this property, and thus { f,,} is a sequence in A converging 
uniformly to f. 


24. For (a), we identify Co([0, +00), R) with the subalgebra of C((0, +00], R) 
of continuous functions equal to 0 at +-oo. The function e~* separates points on 
[0, +00]. Apply Problem 22 to the algebra it generates, namely the algebra of all 
finite linear combinations of e~"* for n a positive integer. 

For (b), let € > 0 be given, and choose g(x) = }>c,e~™ by (a) such that 
SUPQ<y<4o0 | f (x) — g(x)| < €. The hypothesis forces Ae f()g(x) dx = 0, and this 


is i f@yYdx - iN f(x)(f@) _ g(x)) dx. Thus 
b b 
o> [ perrax—| [ fon(Fe - een) art. 


So fy f(x)? dx <€ fo |f (x)| dx. Since € is arbitrary, f)” f(x)? dx = 0. Therefore 
f=0. 


25. Isometries are uniformly continuous. Applying Proposition 2.47 to the uni- 


formly continuous function @2 0 (Ce la ( a) of the dense subset yj (X) of X7 into X35, 


we obtain an isometry UV : X} > X> extending g2 0 (y7' | . Reversing the roles 


) 
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of Xj} and X3, we obtain an isometry ® : X53 — Xj extending g) 0 (yy! see) 
®oW is acontinuous extension of the composition gy 0 (95! ly es) 020 (Ce i | , 
which is the identity map on g;(X). Hence ® o W is the identity on X}. Similarly 
W o @ is the identity on X3. Thus W is onto. This proves existence. 

For uniqueness let YW and W* be two such maps. Then ¥~! o W* is a continuous 
extension of the identity map on the dense subset g1(X) of Xj, and hence it is the 


identity. Therefore ¥V = W*. 


26. Theorem 2.60 says that X is dense in X*. Then X = X™* if and only if X is 
closed, and this happens if and only if X is complete, by Proposition 2.43. 


27. The only one of these that requires explanation is (iv). We may assume 
that none of r, s, andr + is 0. Write r = mp*/n and s = up'/v with p not 
dividing any of r,s, u, v. Without loss of generality, we may assume k < /, so that 
max{|r|p, Is|p} = Ir |p = me We have 

r+s=mp*/n+up'/v= p\(2+ ) = pee), 


n Uv nv 


The denominator nv is not divisible by p. The part of the numerator within the 
parentheses is an integer, and we factor out any factors of p from it as p* witha > 0. 
Then we have |r + S|) = p~**® and this is < p~* as required. 

28. For the triangle inequality, letr, s, t be given. Then Problem 27 gives d(r, t) = 
Ir — t\p =|r—-—s)+(s—- t)|p < max{|r — S\p, |s — t\p} <|r- S\p + |s — t\p = 
d(r,s)+d(s,t). 

29. Part (a) will be illustrated by the more difficult (b) and (c). Multiplication by a 
member r of Qis a uniformly continuous function from Q into Q,; in fact, the equality 
Ir(s — So) |p = Irlpls — So|p Shows that if € is given, then the 6 of uniform continuity 
can be taken as |r [p'€- Proposition 2.47 then tells us how to form products rs forr in 
Qands in Q,. For fixed s, the result is a uniformly continuous map of Q into Q, since 
| - |, extends continuously to Q, and we have |(r —ro)s|p = |r —ro|p|s|p- A second 
application of Proposition 2.47 extends the operation to a mapping of Q, x Q, into Q, 
that is uniformly continuous in each variable when the other variable is held fixed. In 
fact, it is continuous in both variables since |rs —roso|p = |(r —ro)s +ro(s —So)|p X 
Ir —rolp|Slp + lrolpls — Sol < Ir — rolpls — Solp + Ir — rolp|Solp + Irolp|s — Sol. 

For (c), take a shell Ag, = {r €Q | p-* < |r|p < p"}. This is a closed 
subset of Q,, hence complete. Reciprocal is a mapping from A,x¢ M Q into Agn 
that is uniformly continuous because r and s in A,x MQ implies |r~! — mar = 
(is —r)/rs|p = |s — tials isle < p**|s — r|». Hence reciprocal extends to 
a uniformly continuous mapping from A,, to Ag,. These mappings are consistent 
as n and k tend to infinity, and thus reciprocal is a well-defined function from Q> 
to itself. It is continuous because the same computation as just given shows that 
rt ro |p = Ir —rolplr lp lrol5!. Lfwe write |r|) = |Irolp — |r —rolp| and require 


1 = -1 1 —-1 2 eas 
that |r —ro|p < 5|rolp.then |r~!—rg "|p = Ir —rolp(slrolp) _ |rol;'. and continuity 
of reciprocal at ro follows. 
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The abelian group axioms in (c) are associativity, commutativity, existence of the 
two-sided identity 1, and existence of two-sided reciprocals. To complete (c), we 
need associativity and commutativity. We can regard associativity as asserting the 
equality of two continuous functions from Q, x Q, x Q, to Q,. These are equal on 
Q x Q x Q, and this subset is dense. Hence the two functions are equal everywhere. 
Commiutativity is proved similarly. 

The distributive law in (d) is proved by the same technique used for associativity 
in (c). Thus Q, is a field. 


30. For (a), it is enough to prove that S = {t EQ | Itlh S 1} is totally bounded. 
For x in Q, let C(6; x) = {t E Q| It—Xxlp S 5}. It is enough to show for each 


integer / > 0 that $ C U7; C(p';r). If t is given in S, t is of the form t = m/n 
with m and n in Z and n nondivisible by p. Let n=! denote the integer from 0 to 
p' —1such that nn~! = 1 mod p’, and let r denote the integer from 0 to p! — 1 such 
that n~'m =r mod p!. Then m — nr = 0 mod p’, and so |m — nr|p < p~'. Since 
lap = 1, [5 —r|, <p. Thus ¢ is in C(p™s 7). 

For (b), compact sets are closed and bounded by Proposition 2.34a. Conversely 
let E be closed and bounded. The set T = {t €Q, | Itlh < 1} is certainly closed. 
Since Q, is complete, T is complete. Part (a) shows that T is totally bounded. 
By Theorem 2.46, T is compact. The given set F is contained in some set T, = 
{t EQ, | Itlp < p"}. Multiplication by the member p~” of Q, carries T continuously 
onto T,, and T,, is compact by Proposition 2.38. Since EF is a closed subset of the 
compact set 7,,, Proposition 2.34b shows that F is compact. 


31. The first two assertions are routine consequences of (11), (iii), and (iv). Let 
us consider the quotient Z,/P. We show that P is a maximal ideal. In fact, if J is 
an ideal in Z, properly containing P, then J contains some element f with |t|, = 1. 
Then (iii) shows that t~! has |t~'|, = 1 and lies in Z,. Since t is in J and t~! is 
in Z,, their product | is in J. Thus J = Z,. In other words, P is a maximal ideal. 
Hence Z,/P is a field. To complete the argument, we show that Z,/P has exactly p 
elements. Given x in Z,, choose m/n in Q with |x =o ks < p!, by denseness of 
Qin Q,. Here |2 |, < 1, and we may assume that n is nondivisible by p. Arguing 


as in Problem 30a, we can find r in {0,1,..., p — 1} such that |= — r|, < pr’ 


Then |x —r|p < max {|x = = r| < p7! by the ultrametric inequality. So 
x = (x—r)+r withx—r in P. Thus {0,1,..., p—1} represents all cosets of Z,/P. 
Finally no two distinct elements r andr’ in {0,1,..., p — 1} have |r —r’|p < pu! 


because this inequality would entail having r — r’ divisible by p. 
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1. For (a), ITS? = Vy ITSe)P? = Yj |X SE), eT i)’. Use of the 
triangle inequality and then the Schwarz inequality shows that this expression is < 
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XY (LilSe) el ITEM)” < Xj ((Li1GE), dP)" (SIPPY) = 
y ISepPIT? = |S}? |7/. Part (b) is routine. 

2. The member of L(R”, R”) with matrix A. 

3. lim sup,_,o (lA|'| f (2) — 0 — O|) < limsup,_,o (JA|~'|A|?) = 0. 

4. The formula is f(x + tu)| _o = yi uj s£(x). The argument is written out 
within the proof of Theorem 3.11. 


5 ( 0 ) a ') ( cost =) bee ) Gee oe 
Loe)? © Lor)? \—sint cost)? \isine cose )? \ sinhe cosh )* 

7. The equality is false because the left side is positive and the right side is 
negative. In fact, the left side is fe [lim New — 2e~*”) dx] dy, which equals 
ie lim [-e /yte*7/y]! dy = de cleo dy;sincee~Y > e~*” on (0, 1), 
the left side is > 0. Meanwhile, the right side is ie [ —e/x+ e289 /x]p dx = 


ie a [e** _ et] dx; since e~2* 
x 


<e-* on (1, oo), the right side is < 0. 
8. Define || - ||, as in Section I.10, and let f,(t) = f(x — 1); the latter definition 


is not the one used earlier in the book. For (a), the Schwarz inequality gives 


If x(x) — fe eQol=|4 f7[f@ —) — fxo —D1g(t) de| 
= fc — Frollallglla < Illa sup lf — 1) — f@0— OI, 


and the right side tends to 0 as x tends to x9 by uniform continuity of f. This proves 
that f * g is continuous. The periodicity is evident. The proof that f * g = g * f is 
the same as the proof in Section I.10 that f * Dy = Dy * f. 

For (b), an application of Fubini’s Theorem (Corollary 3.33) and a change of 


variables gives - [7 f * g(xjeni dx = (£)° J" J" fe —DgWer™ dtdx = 
(Se)? ft ff — Negeri dxdt = (LY fT f% fa@ge me dxdt = 


(se)? ff fa) genie dx dt = Cndy. 
For (c), we apply the Weierstrass M test. It is enough to prove that 7, |cndn| < 
+oo, and the Schwarz and Bessel inequalities together do this: 


Yo lendul < (Y- len)? Ian?) < I flllallglls < +00. 


9. Write out each side as an iterated integral, and apply Fubini’s Theorem (Corol- 
lary 3.33). 


10. For the partial derivatives, #*(0,0) = 4 f(=® )| 9 = 0 and 3£(0,0) = 0 


> Ox x20 
similarly. The fact that f is not continuous at (0, 0) is a special case of Problem | la. 
11. For (a), the homogeneity says in particular that f(rx) = f(x) forr > 0 and 
|x| = 1. Then sup,zo | f(y)| = sup), | f(x], and the right side is finite, being the 
maximum value of a continuous function on a compact set. If f(y) is continuous at 
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y =0, then f(0) = lim,jo f(rx) = f(x) for every x with |x| = 1 and so f must be 
constantly equal to f (0). 

For (b), limsup,.,_,9 | f(rx)| = lim sup,,._,9 r@| f(x)| = 0ifd > Osince f(x) is 
bounded for |x| = 1. Thus f is continuous at 0 if d > 0 and f(0) = 0. Ifd <0, 
then lim sup,,_,9r7|f (x)| = +00 if d <Oand f(x) 40. 

For (c), we have f(rx) = vo FQ) for any x = (xj,...,Xn) # O. Put g = 
f om,, where m, refers to multiplication by r. The homogeneity gives g = r@f, 
and a ge (x) = = ré 3e (x). On the other hand, a chain rule gives 5a (X) — 


Da WF (rx) 28D (x) =r iL (rx), Sor4 24 (x) =r (rx), and (c) follows. 


For (d), the given conditions say that f (Ge tf a for all real t. Then - ab AU 
lim,.ot~'(f (0 + te;) — 0) = lim,.9t7'tf(e;) = f(e;). On the other on (c) 
says that 0f/dx; is homogeneous of degree 0, and (a) says that df/dx; cannot be 
continuous at 0 unless it is constant. 


12. Part (a) follows from Problem 11b. In (b), 2 ae FO) = £ FO +1rd, 0) |. = 


£ a = | and of (0) = £ FO + ty)| 9 = £ 0} 9 = 0. The failure of continuity 
is by parts (a) and (c) of Problem 11. 
For (c), we have £FfO + tu)| _ _9 = Lt cos? 6 _o = cos? 6. Hd f were ohio 


tiable at x = 0, the chain rule would give £fO + tu)| _ _9 = =u, 2 ay FO) +u2 <0) = 
cos 6. Since cos? 6 is not identically equal to cos 0, f is not differentiable at 0. 


13. Part (a) follows from (a), (b), and (c) of Problem 11. About 0, the function 
f is even in x and even in y, and hence the first partial derivatives are odd about 0. 
Then part (b) follows from ener 11d. To calculate the results for (c), we need to 
compute 5 oe y Xs, 0) for x £0 and 3 a (0, y) for y ~ 0. The first of these is x, and the 
second is —y. The formulas for the second partial derivatives follow. 


14. Forn > 0,r"e!”? = (x+iy)"i is of class C®,andsoisr”e7!”? = me For 
the first of these functions, Hx +iy)” =n(n—1)(x +iy)"-? , while + (x +iy)" = 
i2n(n — 1)(x +iy)"~?. Hence A(x + iy)” = 0. The result for (x — iy)" follows 
by taking complex conjugates. The final conclusion is a routine consequence of 
Theorem 1.37, the complex-valued version of Theorem 1.23, and the fact that each 


term is harmonic. 
15. This follows by direct calculation. 


xy +x 
x + ye 
(2,2). One checks that g’(1,1) = G a The locally defined inverse function f 


16. In the notation of Theorem 3.17, g(x, y) is ( ), a is (1, 1), and dD is 


near (2, 2) has f’(2, 2) = g’(1, I)! = ‘e ae) nd 2 (2, 2) is the upper left 


entry of this, namely 3/14. 


17. All 6 derivatives of possible interest are given by the matrix product 
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2-10\7! /o 0 0 —x/2 
: ZS O-x/2) = £(0 -* |. Then 2(7/2,0) = 0 and #(7/2,0) = 


0 0 0 32/2 
—/12. The function x (u, v) is of class C® by Corollary 3.21. 


18. The map in question is X ++ X? and is the composition of X + (X, X) 
followed by (U, V) > UV. Here we can write UV = L(U)V = R(V)U, where 
L(U) is the linear function “left multiplication by U” on matrix space and R(V) is 
the linear function “right multiplication by V-’ The derivative of (U,V) > UV is 
then (R(V) L(U)) by Problem 2. Hence the derivative of X th X eas by the chain 
rule, is 


(RV) L(U)) i = (R(V) + LU) | py» = ROO) + LOX). 


U=V=X 


At X = 1, this is R(1) + L(1), which is “multiplication by 2” and is invertible. The 
Inverse Function Theorem thus applies. 


19. We may assume that g’(xo9) 4 0, thus that z= (xo) # 0 for some i. We 
take this i to be i = n; the other cases involve only notational changes. Write 
x = (x’,x,) with x’ € R’!, and write x9 = (a,b) similarly. Then the Implicit 
Function Theorem produces a real-valued C ' function A(x’) defined on an open set 
V about the point a in R’~! such that h(a) = b, g(x’, h(x’)) = 0 for all x’ in V, 
and in (4) = ~(7£@,b) (2,5) for 1 < j <n. Let H(x) = (x’, h(x’). 
Form f o H, which has a local maximum or minimum at x’ = a in V. All the first 
partial derivatives of this function must be 0 at x‘ = a. Thus, for1 < j <n—1, 


0 = 2) (a) = yr, “A (x9) MH (a). Since Hj(x) = x; fori <n, all the terms of 
J U ss | 


this sum are 0 except possibly for the Sh and then". Thus0 = af (xo) + of (xo) ah (a) 


= 3£ 00) — (36) ao) (38 (a,b))'(#£(a,b)) for j <n. The right side is 0 trivially 
-1 


for 7 =n, and thus the result follows with A = ~((a, b)) 

20. The difficulty in handling this inequality as a maximum-minimum problem is 
the question of existence. Lagrange multipliers can constrain matters to a compact 
set, and then existence is no longer an obstacle. The domain D initially will be 


the set where a; > 0,...,a, => 0. Fix a number c, and let g(qj,...,a,) = 
ta +---+a,)—c and f(aq,...,d,) = %/a1---a,. The subset of D where 
g(a1,..-,@,) = 0 is compact, and f must have an absolute maximum on it. This 


maximum cannot occur where any a; equals 0 since f is 0 at such points. So it 
is at a point in the set U where all a; are > 0. Apply Lagrange multipliers on U. 
The resulting equations are i(a + -An)'!" fa; = 1/n for 1 < j <n, as well as the 
constraint equation (a +-++++a,) = c. The first n equations show that all a;’s 
must be equal, and the constraint equation shows that they must equal c. The desired 
inequality is true in this case and hence is true in all cases. 
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Chapter IV 


1. For (a), sy? = —5r +c. Adjusting c, we have y? = —t* +c. Then 
y(t) = +Vc — t?. For (b), the exceptional points are (fg, 0). For (c), a solution with 
y(to) = yo is y(t) = sgn(yo)y/ ¥5 + tg — 27. 


2. In Theorem 4.1, take a = 1 andb = 1. Then M = 2anda’ = 5: The theorem 
therefore gives a solution for |t| < 1/2. 


3. To be an integral curve, (x(t), y(t)) must satisfy x’(t) = /x and y’(t) = 1/2. 
Then 2./x(t) = tf +c and y(t) = zt + cy. At some unspecified time fo, the curve 
is to pass through (1, 1). Then x(t) = 1 and y(fo) = 1; these force 2 = 9 +c) 
and 1 = 5% +2. So (x(t), y(t) = (F@ — + 2)?, 5 — ) + 2). If = 0, for 
example, the curve is (x(t), y(t) = ({(@¢ +. 2)?, 5(¢ +2). 


4. This uses the multivariable chain rule, Proposition 3.28b, and the Fundamental 
Theorem of Calculus. The derivative in question is 


= (2t)(1/t7) sin(t?) + fe (afatys! sin(st)) ds = (2/t) sin(t?) ee cos(st) ds 
= (2/t)sin(?) + [17! sin(st)]\_5 = (2/t) sin(t?) + 27! sin(t?). 


5. y(t) =24+cye! + c2e”. 


3 1 1 0 a e 
6. For (a), J = and B = for the first,andJ=[{0O i 0 
0 3 pie | 0 0 —i 


0 i -i 
and B = ( 1 O 0) for the second. For (b), the bases are e* (3) and 
0 1 1 


0 i —1 
a ( if? +t 1) \ for the first,ande’{ 1 ],e" (0 },e~” {| 0 | forthe second. 
| : 0 1 1 


Part (b) can be solved directly without solving part (a) first. Consider the 2-by-2 
example. The only root of the characteristic polynomial is 3, and it has multiplicity 2. 


We solve (A —3-1)ko = Oand get ko = i) Then we solve (A—3-1)l9 = 6 


and get lo = boa): Choose any c # 0 and any d, say c = 1 andd = 0. Then 


ky = ( : ) , and lop = a ) , and we obtain the solutions in the form given above. 


For more complicated examples, the choice of these constants can get tricky, but this 
method works quickly for easy examples. 


7. Forn = 1, det(A — (—ao)) = A+ ao. Assume the result for n — 1, and expand 
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the n"*-order determinant by cofactors about the first column. Then 


2-10 0 0 0 
a -1 0 0 0 
» =I 0 0 
det(Al — A) = det ot 
a -1 0 
x ol 
ay ay a2 an-1 
-1 
aA-10 - 0 0 
a -1- 0 0 Stet 
= ) det te eek og I eseoay tgs det 
oy 1.00 
= 
* 
a\ a “ An-1 =| 


= AAT! + aya? ++» tay) + (-1)"1a9(-1)""! 
=X" + aya"! +--+ +40, 


the next-to-last equality following by induction. 

8. In (a), let | f,(t) < M for all t and n. Then |F,,(t) — F,(t’)| = I Fnls) ds| < 
M|t — t'|. Thus equicontinuity holds with 6 = «/M. 

In (b), we solve the equation explicitly, using variation of parameters. The solutions 
of the homogeneous equation are cj cost + cz sint, and computation shows that 
the unique solution of the inhomogeneous equation with the given initial condition 
is y*(t) = —(cosf) i (sins) f(s)ds + (sint) fg (cos s)f(s)ds. Each integral is 
equicontinuous by the same argument as in (a), and the operations of multiplication 
by a bounded continuous function and addition preserve the equicontinuity. 

In (c), we do not know explicit formulas for the solutions of the homogeneous 
equation, but the same argument as in (b) with variation of parameters will work 
anyway. 

10. For any C? periodic function f , the n Fourier coefficient c, of f has |c,| < 
n~* sup|f”|. The function v(r, 0), being a composition of two C? functions, is C? for 
0 <r < land |6@| < z,and hence sup [=| is bounded by some M for0 <r < 1-6. 
Then we obtain |c,(r)| < M/n?. 

11. The function (uoRg)(x, y)e~'*# is of class C? jointly inx, y, y. By Proposition 
3.28 we can pass the second derivatives with respect to x and y under the given integral 
sign with respect to g. The integrand is harmonic in (x, y) for each ¢, and therefore 
the integral itself is harmonic. The integral itself is given by 


= fe v(r,O + gje ik? dg= = fees } Seacen Ca (rein ei abo do. 


The series in the integrand is uniformly convergent as a function of g, by the estimate in 
Problem 10 and by the Weierstrass M-test. Theorem | .31 says that we can interchange 
sum and integral, and then the right side above collapses to cx (r)e!”’. 
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12. Starting from v(r, 6) = u(r cos 6, r sin@), we compute ou and ou by the chain 
rule and obtain 


dv _ ou ing 2% dv _ _+ gin g 2% ou 
or = COs @ = + sind By and 4, =-rsin@ ax TT COSO 5. 
: : 92 92u : $ veins 
Using the same technique, we form — and os in terms of the partial derivatives of 


u, and we find that 


Substituting v(r, 0) = cx (r)e‘* and taking into account that Au = 0, we obtain 


0 = el (cf tro'c, — Kreg). 


Thus ref +r, — k*c, = 0. This is an Euler equation. The solutions are c;(r) = 
ayr*| + ber" if k A O and are ap + bologr if k = 0. Taking into account that 
cx(r) is differentiable at r = 0, we obtain cx(r) = agr*! for all k. Substitution gives 
v(r, 0) = sae cart eit? , 

13. Since fr(0) = or cn RIM!” and P,jr(0) = O_o (r/R)e!”®, the 
result follows immediately from Problem 8b at the end of Chapter III. 

15. For (a), substitute y = uv, y’ = u’v+uv’,and y” = u"v+2u'v' +uv" into the 
equation for y, take into account that wu” + Pu’+ Qu = 0,and get 2u’v'+uv"+ Pu’ = 
0. Put w = v’. We can rewrite our equation as w’ = (—P — 2u'/u)w since u 
is assumed nonvanishing. Then Problem 14 gives w(t) = ee J P42 [ema = 
conf PA glostur) — cu(ty-2e Sf PO*, 

For (b), the formula in (a) gives v(t) = ce 
et 2 ths es 2 ds, 

16. The substitution leads to uv” + (2u'+ Pu)v’' + (u” + Pu'+ Qu)v =0. Thus 
the condition is 2u’ + Pu = 0. By Problem 14, u(t) is a multiple of e~ [PR The 
computation of R(t) is then routine. 

17. Substitution of v = ur~!/? shows that L(v) = r!/?Lo(u) with Lo of the 
indicated form. 

18. For (a), the formula is d, = — baa Crdyn_~, With dg = 1. For (b), we have 
d| = —c\dy = —c, so that |d;| = |ci| < Mr!. Thus |d,| < M(M + 1)""'!r" for 
n = 1. Assume that |d;| < Mr* for 1 < k <n. Then ldn| < ee, ICn—klldk| < 
len] + ota (Mr) (M (M+ 1k) < Mr” + Mr” S22) (M + DE. This is 


-P 12 and hence y(t) = u(t)v(t) = 


= Mr"(1+ Mo} (M + 1") 
= Mr"(1+M(M + 1)""!—1)/(M+1)-1) 
= Mr"(1+(M41)"'-1) =M(M 41)" Tr". 
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For (c), we may assume that f(0) = 1. Write f(x) = eer Cyx", and define d, as in 
the answer to (a). The estimate in (b) shows that the power series g(x) = pyaar, dnx" 
has positive radius of convergence, and Theorem 1.40 shows that f(x)g(x) = | on 
the common region of convergence. Then g(x) = 1/f (x), and 1/f (x) is exhibited 
as the sum of a convergent power series. 


19. The indicial equation is s(s — 1) + aos + bp = 0, where ag = P(O) and 
bo = Q(O). Thus s; + sz = 1 — aq. 
In (a), we apply Problem 15a with u(t) = t*! Sa c,t". The expression P(t) in 


that problem has become t~! P(t) here, and we obtain v'(t) = u(t)~7e" i ew) “ In 
the integrand of the exponent, we separate the term —ag/t from the power series, and 
we see that v(t) = u(t)~2e7% logt y power series = ty (t)~2 X power series, the 
power series having nonzero constant term since exponentials are nowhere vanishing. 
This is of the form t~7°!~® x power series as a consequence of Problem 18 and 
Theorem 1.40, the power series having nonzero constant term. When this expression 
is integrated to form v(t), the t~! produces a logarithm, and the rest produces powers 
of t. Thus v(t) equals clog t + t72si-aotl yx power series; here the power series has 
nonzero constant term. Then u(t)u(t) = cu(t)logt + t*! pg SSPE power series; 
once again the power series has nonzero constant term. The exponent of ¢ in the 
second term is —s; + 1 — ag = —s, + (8; + 52) = 5, and (a) is done. 

In (b), we know that there is only one solution beginning with r*', and thus we 
must have c ¥ 0 in (a). Another way to see this conclusion is to recognize that the 
exponent of t7251-40 in v(t) is just —1 since 2s; = s; + s2. Thus the coefficient of 
t~! in integrating to form v(t) is not 0, and the logarithm occurs. 

In (c), we know from a computation in the text that no series solution begins with 
t~? except when p = 0, and thus the first argument for (b) applies. 


20. When t = f_1 is substituted into the formula valid for t,_1 <t < t,, we get 
y(t) = y(t—1); so the formula is valid also at t,_1. 

We induct on k. For k = 0, y(to) = yo. Assume inductively for k > 0 that 
ly(te-1) — y(to)| < M|t.-1 — to| < Ma’ < b. Forty_1 <t < tg, the displayed 
formula in the problem implies | y(t) — y(t,_-1)| = |F (te-1, ¥(@e-1))| |t —te-1|. Since 
(te-1, y(te-1)) lies in R’, |F| is < M on it. Thus |y(t) — y(u-1)| < Mt — t-1] < 
Ma’ < b. If t_) < t < t, then adding such inequalities gives |y(t) — y(to)| < 
M|t; — tol +--- + M|t_1 -— t-2| + M|t — t-1| = M|t — to| as required. Since 
|t — to| < a’, we have M|t — to| < Ma’ < b. Thus (f, y(t)) is in R’. 


21. We may assume that t’ < ¢. If t’ and ¢ lie in the same interval [t,_1, t,] of the 
partition, then y(t) — y(t’) = F(tg_1, y(te_-1)) (¢ — t’). Taking absolute values gives 
ly@)—y@)| < Mt 7. 

Otherwise let ¢’ < t; < t,-1 < t. Then each pair of points (¢’, t7),(t, tr41), 
os (te—2, th-1) (te-1, t) lies in a single interval of the partition. Adding the estimates 
for each and taking into account that each difference of t values is > 0, we obtain 
ly) —y@)| < Mt —2'|. 
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22. Let te_1 < t < t{%. Then i, y'(s)ds = SB rat Ae y'(s)ds + ne y(s)ds = 
(y(t) — y@o)) +--+ + G1) — YG-2)) + OO - YG-D) = YO — yo), 
by an application of the Fundamental Theorem of Calculus on each interval. If 
tk-1 <8 < t%, we have |y'(s) — F(s, y(s))| = |F (4-1, Y(te-1)) — F(S, y(s))|. Here 
|s — te_1| < |t% —te_1| < 6 by the choice of the partition. Again by the choice of the 
partition, |y(s) — y(t_-1)| < M|s — t.-1| < M(6/M) = 6. By the definition of 6 in 
terms of € and the uniform continuity of F , we conclude that |y’(s) — F'(s, y(s))| < . 

23. We have |y(t) — (yo + fi F(s, y(s)) ds)| = | fi Ty'(s) — FG, y(s))]ds| < 
Si ly(s) — FG, y(s)) lds < fi eds < elt — to] < ea’. 

24. The statement of Problem 21 proves uniform equicontinuity with 6 = «/M. 
If we specialize to ¢’ = fo, it implies uniform boundedness. 

25. Let y(t) = limy,, (t) uniformly. The functions y,, (¢) are continuous, and the 
uniform limit of continuous functions is continuous. Hence y(t) is continuous. By 
Problem 23 we have | yng (t) — (yo + ‘Ss F(S, Yn, (S)) ds)| < €,a' for each k. We 
take the limsup of this expression as k tends to infinity. We know that y,,(¢) tends 
uniformly to y(t). Then y,,(s) tends uniformly to y(s) uniformly for fo < s < t. By 
uniform continuity of F, F(s, yn,(s)) tends uniformly to F(s, y(s)). By Theorem 
1.31, f/ F(s, yn,()) ds tends to f' F(s, y(s)) ds. 


Chapter V 


1. For (a) and (c), the answer is 2 for 0 < k <n. However, the assertion in (d) 
is false; for a counterexample, take X = {1, 2, 3, 4}, and let 6 consist of all sets with 
an even number of elements. For (b), the associativity is proved by observing that 
AAB AC is the set of all elements that lie in an odd number of the sets A, B,C. 

2. Let X = {1, 2, 3} with the o-algebra consisting of all subsets. Take p({1}) = 
e({3}) = +2, o({2}) = —3, A = {1, 2}, and B = {2, 3}. 

4. This can be worked out carefully, but it is easier to use Problem 3 and apply 
dominated convergence to see that the measure of the left side is lim sup 1(E,,), and 
the measure of the right side is lim inf u(E,,). 


5. Part (a) is proved the same way as for Lebesgue measure. In (b), the interval J 
of rationals from 0 to 1 has w(7) = 1, and it is a countable union of one-point sets 
{p}, each of which has w({p}) = 0. 


6. Argue by contradiction. If E° is not dense, then there is a nonempty open 
interval U in [0, 1] with UN E° = @ and hence U C E. Since w(U) > 0, we must 
have u(E) > 0. 

7. As soon as sup L(A) is known to be finite, B can be constructed as the union 
of a sequence of sets whose measures increase to the supremum. Thus assume that 
the supremum of (A) over all sets of finite measure is infinite. Then we can choose 
a disjoint sequence of sets A, with each ~(A,) finite and with )° w(A,) = +00. A 
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little argument allows us to partition the terms of the series into two subsets, with the 
series obtained from each subset divergent. Say the terms of one subset are (B;) 
and the terms of the other are (Cj). Since )> 4(B;) = +00, the hypothesis makes 
u((U; Bi)‘) finite. A contradiction arises because ((U; B;)° 2 U; Cj and Uj; Cj 
has infinite measure. 

8. Consider the set A of all Borel sets E such that f ~!(E) is measurable. The set 
A is closed under complements and countable unions, and it contains all intervals. 
So it is a o-algebra containing all intervals and must consist of all Borel sets. 


10. This problem can be done via dominated convergence, but let us do it from 
scratch in order to be able to quote it in solving Problem 18 and other problems. We 
have 


ly fudu— fy fdu| < fy lf — fldu < u(X) sup, |falx) — f@)I, 


and the right side tends to 0 by the uniform convergence. Thus limy f, du = y fap, 
the limit existing. 

11. In (a) the approximating sets are finite unions of intervals, and we can add 
their lengths to obtain EE (1 —r,). Then apply Corollary 5.3. For (b), the set C° 
is open, and every point of C“ has an open interval about it where J¢ is identically 0; 
this proves the continuity at points of C°. To have continuity of J¢ at a point xo of 
C, we would need Jc > 1/2 on some interval about xo, and this would mean that 
Ic equals | on that interval and hence that the interval is contained in C. But C 
contains no intervals of positive length. Part (c) is handled by the same argument as 
(b). For (d), part (c) says that J- cannot be redefined on a Lebesgue measurable set 
of measure 0 so as to be continuous except on a set of measure 0. Theorem 3.29 says 
that no f obtained by redefining Jc ona set of Lebesgue measure 0 can be Riemann 
integrable. On the other hand, Jc is measurable, being the indicator function of a 
compact set, and hence it is Lebesgue integrable. 

12. Argue for indicator functions and then simple functions. Then pass to the 
limit to handle nonnegative functions. 

13. Let D be the diagonal. Let B be the set of all subsets of A x A containing 
only countably many members of D. This is a o-ring, and it contains all rectangles 
of the form A x A’, A x B, and B x A with A, A’, and B° in A. If B, denotes the 
set of complements of members of 6, then Proposition 5.37 shows that C = BU B, 
is a o-algebra, and certainly C contains all rectangles AS x A’° with A and A’ in A. 
Therefore C = A x A. If the set D is in A x A, either D or D‘ must be in B, and 
neither is the case. 

14. To prove that R is measurable, one first proves the assertion for simple functions 
> 0 and then passes to the limit. For the rest Fubini’s Theorem gives 


Presctocpoo) I MH xm) = fy [ fio4ocy LAO») dm(y)] dua) 
= fy [So.pos Imo] dua) = fy FO) due). 
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15. This is proved in the same way as Proposition 5.52a. 

16. The measure space is the unit interval with Lebesgue measure, and each f,, is 
an indicator function. The set of which /f, is the indicator function is the subset of 
R between a ay and ase a, Written modulo 1, i.e., the set of fractional parts of 
each of these rational numbers. The divergence of the series forces these sets to cycle 
through the unit interval infinitely often, and thus f,(x) is 1 infinitely often and 0 
infinitely often. 

17. From the definition of Eyy, we see that Jy Eun = X and()\y Ey = 
@. The sets Eyy are increasing as a function of N, and their complements are 
decreasing with empty intersection. Corollary 5.3 produces an integer C(M) such 
that MEY cm) <¢/2”. PutE = Wag Ev.ccu): Then (EF) < € by Proposition 
5.1g. If «’ > 0 is given, we are to produce K such that | f(x) — f(x)| < €’ for 
all k > K and all x in E°. Choose Mg with 1/Mo < «’. The integer K will be 
C(Mo). Since x is in E® = ()\y Eu,ccw, X is in Eyy.ccmp) in particular. Then 
lfe(x) — f(x)| < 1/Mo < €' for k > C(Mo). 

18. In (a), we may take the set of integration to be X. Let S be the set of 
measure 0 on which any of f, and f is infinite, and redefine all the functions to be 0 
on S. Given € > 0, choose 6 > 0 by Corollary 5.24 such that (F) < 6 implies 
ne peau <€. Let E beas in Egoroff’s Theorem for the number 6. Problem 10 shows 
that lim f,.. fradu = fp f du, the limit existing. Also, | [7 fn du| my A ey ees 
J,g au < for all n, and similarly for f. Hence limsup, | fy fndu — fy fdu| < 
2e. Since € is arbitrary, the result follows. 

In (b), consider the measure g diz and the sequence of functions {h,,} with h,(x) = 
Fn(*)/g(x) when g(x) > 0,/,(x) = 0 when g(x) = 0. After checking that h, is 
measurable, use Corollary 5.28 and apply (a). The constant that bounds the sequence 
is 1. 

19. By Fatou’s Lemma, f,. fdu < liminf, f,. frdu. Subtracting this from 
Jy fdu = lim fy frdu gives [, f du > limsup, f; frdu. Another applica- 
tion of Fatou’s Lemma gives lim inf, /; pind = J, rz J du, and we conclude that 
liminf, [, fo du = lim sup, fi, frdu = J, f du, from which the result follows. 

20. Let € > 0 be given. Choose 5 > 0 by Corollary 5.24 such that u(F) < 6 
implies 2 f du < €. Then choose E with w(E) < 6 such that f, converges to f 
uniformly off E. Problem 10 shows that there is an N such that relin— fldu<e 
for n > N, and Problem 19 shows that there is an N’ such that te lfn —fldu < 
Se fndutf,fdu<2f, fduteforn > N’. Since w(E) < 5,2 f, fdute < 
3e. Then n > max{N, N’} implies Ie lfn — fldu <4e. 

21. Suppose that lim fy f,du = fy fdu. Given € > 0, choose 5 > 0 by 
Corollary 5.24 such that w(E) < 6 implies J, pf du < €. Then choose N such that 
N-!(f, f du +e) <5. For any n, the convergence of f,, f, du to f,, f du implies 
that Nu({x | fa) = N}) Lieepensny frau < ie fndu < pe fdu +e ifn is 
sufficiently large. Hence w({x | frn(x) => N}) < NL fdut €) < 6 for large 
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n, and therefore Sects «yeny f dh < €. Problem 20 shows that Ix lin - fldu < 
€ if n is large enough, and then also Sets ()>N) lfn — fldu < €. So we have 
Set proneny Int © firt peneny Sn — 14H + foes goon f dM S € + € = 2¢ for 
n large, say n > N’. By increasing N and taking the integrability of fi,..., fi—1 
into account, we can achieve the inequality Sex | f(@=N} Indu < 2e for all n. 
Conversely suppose that { f,} is uniformly integrable. Given € > 0, find the N of 
uniform integrability, put 6 = €/N, and choose Eo by Egoroff’s Theorem such that 
(Eo) < 5 and f, converges uniformly off Eo. Then lim fp. fndu = fine f du by 
0 0 
Problem 10. Fatou’s Lemma gives f,,, f dw < liminf f,,, fn dj, and we have 


Sey fn A= Jegocey poozny Sn IHF Saye) pycyeny Fn Ue 


The first term on the right side is < Sexi ¢ ()>N) tnd, which is < € by uniform 
integrability, and the second term on the right side is < Nd = € because (Eq) < 5 
and f,(x) < N on the set of integration. Thus lim sup /, Eo Ind < 2€, and we obtain 


lim sup, (ey frdu — Sn, f au < de. 


22. In the notation of Section 5, K = U = A since A is now assumed to be 
a o-algebra. Thus u,(E) = supge, ecg UCK) and M*(E) = infyes uve UU). 
Take a sequence of sets K,, in.A with lim w(K,) = x(E); without loss of generality, 
the sets K,, may be assumed increasing. Then we may take K to be the union of the 
K,,. The construction of U is similar. 

The set K is any member of A such that (K) is the supremum of j2(S) for all S 
in A with § C E. Then p(K‘) is the infimum of all w(S°) = w(X) — w(S) for all 
S° in A with S° > E°. A similar argument applies to U and U°. The result is that 
US CE° CK, yp, (E°) = w(U*), and w*(E°) = w(K*). 

23. Lemma 5.33 gives U(AN K) < Ux(AN E), W(ASN K) < py (AS E), and 
My (E) = w(K) = MANK)+M(ANK) < bx (ANE)+Hx(ANE) < U,(E), from 
which we obtain u,(AN E) = w(ANK). The argument that u*(AN E) = w(ANU) 
is similar. 

24. The right side of the definition of o depends only on AN E and BN E*, and 
hence o is well defined. The formulas 


L [(An 0 £)U (Br NEY] = (Ua) 02) u((U a) o£") 


n 


and [(AN E) U(BN E‘)]* = (ASN E) U (BSN E*) show that the sets in question 
form a o-algebra C. Taking A = B shows that A C C, and taking A = X and B= @ 
shows that E is in C. Therefore 6 C C, and a is defined on all of B. 

The complete additivity of o results from the complete additivity of each of the four 
terms in the definition of 0. Specifically let a disjoint sequence (A, M E) U (B, NM E) 
be given, and let A = L),, An and B = L),, Bn. We have w.(AnN E) = (A, NK), 
and the sets A, 9 K are disjoint; thus )° .(An OM E) = Wx(AN E). The next term 
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is w*(A, 1 E) = (A, NU), and the sets A, 1 U may not be disjoint. However, 
L*(Am 1 E) + w*(An 1 E) = (Am OU) + (An OU) = W(Am 1 An 1 E)+ 
L(A] U Ao) NE), and W(Am A An NU) = bh (Am O An OE) = *(S) = 0. Thus 
the term with *(A,  E) behaves in additive fashion. Consequently u*(AN E) => 
pu (( ret Ay)ME) = VL h*(A,NE). Letting n tend to infinity gives w*(ANE) > 
yee M*(Ag 1 E). The reverse inequality follows from Lemma 5.33a, and thus the 
term *(A,E) is completely additive. The terms with the B,,’s are handled similarly, 
and o is completely additive. 

Taking A = X and B = @, we see immediately that the formula for o (E£) is as 
asserted. 

To prove that o(A) = u(A) for A in A, we take A = B. Then we see that 
o(A) = tuAN K)+ (1 —thw(AnU)+twAn K+ Ud -huAnU’) = 
tu(A) + (Ll — t)u(A) = pA). 

25. Each member of the countable set has only countably many ordinals less than 
it, and the countable union of countable sets is countable. Therefore some member 
of Q is not accounted for and is an upper bound for the countable set. Application of 
(iii) completes the argument. 


27. For (a), if U, + U and V, t V,then U, UV, *UUV andU,9V, tUNV. 
Similar remarks apply to K,. Then the assertion follows by transfinite induction. 

For (b), we know that K, is closed under finite unions and intersections, and 
we readily see that the complement of any set occurs at most one step later. Now 
let an increasing sequence of sets in various K,’s be given. Say that U,, is in Ko,. 
Problem 25 shows that there is a countable ordinal a that is > all the a,, and then 
all the U,, are in K,,. The union is then in U/,,41 and necessarily in Kg.41. Hence the 
union is in the union of the K,’s. So the union of the K,’s is a o-algebra and must 
contain B. All the set-theoretic operations take place within 6, and thus the union 
must actually equal B. 


28. Proposition 5.2 and Corollary 5.3 show that the value of the measure is deter- 
mined on all the new sets that are constructed in terms of the values on the previous 
sets. Problem 27 shows that all members of 6 are obtained by the construction, and 
hence jz is completely determined on B. 


29. Same argument as for Problem 27b. 


30. At every stage of taking limits, we have closure under addition and scalar 
multiplication. Pointwise decreasing limits produce the indicator functions of finite 
unions of closed intervals, and pointwise increasing limits of them produce the 
indicator functions of arbitrary finite unions of intervals. Since the constants are 
present as continuous functions, we have the indicator function of every elementary 
set and its complement. These sets form an algebra. Going through the construction 
of Problem 27, we obtain the indicator function of every Borel set. Since we have 
closure under addition and scalar multiplication at each step, we obtain all simple 
functions. One increasing limit gives us all nonnegative Borel measurable functions, 
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and a subtraction (allowable without another passage to the limit) gives us all Borel 
measurable functions. 


32. To see that C has the same cardinality as IR, we can make an identification of 
the disjoint union of R and a countable set. To do so, we write C as the members of 
[0, 1] whose base-3 expansions involve no 1’s. For each such infinite sequence of 0’s 
and 2’s, we change all the 2’s to 1’s and regard the result as the base-2 expansion of 
some real number. This identification is onto [0, 1], and it is one-one if we discard 
from C all the sequences of 0’s and 2’s that end in all 2’s. 

The standard Cantor set has Lebesgue measure 0, and thus any subset of it is 
Lebesgue measurable of measure 0. The cardinality of this set of subsets is the same 
as the cardinality of the set of subsets of R. In Section A.10 of the appendix, it is 
shown for any set S that the cardinality of S is less than the cardinality of the set of 
all subsets of S. So the cardinality of the set of Lebesgue measurable sets is at least 
that of the set of all subsets of R. 


33. Since C° is open, any member x of C° has the property that Jc, is 0 on some 
open interval about x. Thus Jc is continuous at x. Since C has Lebesgue measure 0, 
Ic: is continuous except on a Lebesgue measurable set of measure 0. Theorem 3.29 
shows that Jc is Riemann integrable. Hence the cardinality of the set of Riemann 
integrable functions is at least that of the set of all subsets of R. 


35. If F is the given filter, form the partially ordered set consisting of all filters 
on X containing F, with inclusion as the partial ordering. The union of the members 
of a chain is readily verified to be an upper bound for the chain, and Zorn’s Lemma 
produces a maximal element. This maximal element is readily seen to be an ultrafilter. 


36. The filter in question consists of all supersets of finite intersections of members 
of C. 


37-38. Suppose that F is an ultrafilter, A U B is in F, A is not in F, and B is not 
in F. Let F’ consist of all sets in F and all sets BM F with F in F. Since B is not 
in F, F’ properly contains ¥. Since ¥ is an ultrafilter, F’ must fail to be a filter. On 
the other hand, by inspection, F’ satisfies properties (i) and (ii) in the definition of 
filter. We conclude that @ is in F’, hence that there is aset F in Fwith BN F = ©. 
Since F satisfies (ii), the sett (A UB)N F = (AN F)U(BN F)=AN Fis inf. 
By (i), A is in F, contradiction. 

Conversely suppose that F is a filter such that either A or A‘ is in F for each 
subset A of X. If F is not maximal, let B be a set that lies in some filter F’ properly 
containing F while B is not itself in . By hypothesis, B° is in F and hence is in F’. 
But then BM BS = @ lies in F'’, in contradiction to (iii). 


39. If an ultrafilter F is given, define w(E) = 1 if E is in F and define (F£) = 0 
otherwise. Then yu is defined on all subsets, and we have w(@) = 0 and w(X) = 1. 
If E and E’ are disjoint, we are to show that 


W(E) + WE") = W(E UE’). 
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If E U E’ is not in Ff, then all terms in the displayed equation are 0 since F is closed 
under supersets. If E U E’ is in F, then Problem 37 shows that E or E’ is in F; on the 
other hand, they cannot both be in F because F is closed under finite intersections 
and the empty set is not in ¥. Thus exactly one term on the left side of the displayed 
equation is 1, and the right side is 1. This proves additivity. 

Conversely if an additive set function jz is given on all subsets of X that takes only 
the values 0 and 1 and is not the 0 set function, let F consist of the sets E for which 
(L(E) = 1. It is immediate that (i) and (iii) hold in the definition of filter. For (ii), let E 
and E’ bein F. Then EU E’ isin F. Hence w(ENE’)+1 = w(£)+u(2£’) = 14+1, 
and u(E ME’) = 1. Hence Fis closed under finite intersections and (ii) holds. Thus 
F is a filter. If A is given, we have 1 = w(X) = w(A) + (AS), and hence exactly 
one of the sets A and A‘ is in F. By Problem 38, F is an ultrafilter. 

The statement that complete additivity is equivalent to closure of the ultrafilter 
under countable intersections is a routine consequence of Corollary 5.3. 


40. This follows from Problems 34d and 35. 


41. Let S, be the set of all integers > n. Since S$; = X, Sj is in the ultrafilter. 
Since the ultrafilter is not trivial, {7} is not in it, and thus Problem 37 shows that S,, is 
in itif S,_; is init. Hence S, is in the ultrafilter for all n. The countable intersection 
(\, Sn is empty, and the empty set is not in any filter. Hence the ultrafilter is not 
closed under countable intersections. Corollary 5.3 shows that the corresponding set 
function is not completely additive. 


43. The proof of Proposition 5.26 shows that the result holds for simple functions 
> 0. If f > Oand g > 0, choose the standard sequences ¢,, and u,, of simple functions 
increasing to f and g. These converge uniformly. Hence so does the sum sy) = t, +Un. 
The same argument as for Problem 10 shows that lim f;, sndu = J, (f +8) du, 
lim fptmdu = J, fd, and lim f,undu = f, gd. Thus the result holds for 
bounded nonnegative f and g. The passage to general bounded f and g is achieved 
as in Proposition 5.26. 


Chapter VI 


1. In additive notation, the sets E + ¢ for t in T are disjoint, and their countable 
union is S!, Since Lebesgue measure is translation invariant, these sets all have the 
same measure c. Then complete additivity gives c oo = 27, which is impossible. 


2. Parts (b) and (c) are easy. For (a), expand the Jacobian determinant J(N) 
in cofactors about the first row, obtaining two terms—one each from the first two 
entries of the first row. The first term is cos 6; times a determinant of size N — 1 
whose first column has a common factor of r cos 6; and whose second column has 
a common factor of sin 6), the remaining part of the determinant being J(N — 1); 
thus the first term gives (r cos’ 6; sin6,)J(N — 1). The second term is —(—r sin 6) 
times a determinant of size N — 1 whose first column has a common factor of sin 0; 
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and whose second column has a common factor of r sin 6), the remaining part of the 
determinant being J (N — 1); thus the second term gives (r sin? 01) J(N —1). Adding 
the two terms gives J(N) = (r sin 6,)/J(N — 1), and an induction readily proves the 
formula. 

3. Replace f in Theorem 6.32 by f o L, and use g = L~!. Since g’(x) = L7! 
for each x, the result follows. 

4, In the result of Problem 3, use L(x) = yx and replace f(z) by f(z)/|detz|. 
Then the left side in Problem 3 is thee f (vx)/| det(yx)|% dx, while the right side is 
| det L|~! lan f(x)/|detx|% dx. Thus |det y|~! tees f(yx)/|det(x)|" dx = 
| det L|-! Sry £)/| det x|% dx, and the problem reduces to showing that det L 
(det y)". One way of doing this is to verify that this formula is true if y is the matrix 
of an elementary row operation and then to multiply the results. But a faster way is 
to let x1, ...,X, be the columns of x, so that L(x], ...,X%,) = (yxX1,.--, YXn). Then 
L as a matrix is given in block diagonal form by a copy of y in each block. Hence 
det L = (det y)”. Ina little more detail, the matrix of L is being formed relative to 
the following basis of My: if E;; is the N-by-N matrix with 1 in the (i, 7)" entry 
and 0 elsewhere, the basis is E11, F21,..., Eni, E12,..., Enn.- 

5. For (a), we have, forn 4 0, 


27 Cy = ie f (xe dx = Siege f(xe"™ dx + [oziviee f (xyew"™ dx. 
~ lal In] I= 
Let us call these terms J and JJ. Since | f(x)| < C|x|® for |x| < 1, 
1s Susie e IP Qidxs C fii<g |x|" dx = te whe 


For I], we use integration by parts and take into account that the terms at 2 and —z 
cancel by periodicity: 


= (fil ate )F@) dx 
y)en7inx y—1/|n x)ewin* VT —inx 
= [iwe ) ie Heel [ fo ie # red Sie f'(xye dx 


—in in 


ait (ales eS ere | tad waged Ge da 


Let us call the terms on the right 777 and IV. Since | f(x)| < C|x|®* for |x| < 1, 


WT < (lf GG) + 1F(— a) S 2C are. 


The derivation of the formula for 77, when applied to f’ instead of f, gives the 
following value for 1V: 


IVe= —43{f (Rein — f'( _ 1etin/inl} _ + 4 ales f" (eye dx. 
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Let us call the terms on the right V and VJ. Since | f’(x)| < C|x|%~! for |x| < 1, 
IVI = (IPG) + LF(- al) S 2C ae. 


Since f(x) is bounded for 1 < |x| < 2, we can write |f”(x)| < C’|x|*~? for 
0 < |x| < 7, in view of the assumption on f”. Therefore 


1 ! a—2 f2CL pl a—2 
VI St Sr ciiee CI dx = FF finn > dx 
aE Df Ts 2c’ 1 
~~ Tea n2 (ae oS ) Sioa In|ite* 


Since 27 |cn| < || + \Z77| + |V| + |VJ|, we obtain |c,| < K/|n|'**. 

For (b), the uniform convergence follows by applying the Weierstrass M-test, and 
the limit is f as a consequence of the uniqueness theorem. 

In (c), a proof is called for. The crux of the matter is to show, under the assumption 
that f is real valued, that the variation V,; of f on [e, 1], which gets larger as ¢ decreases 
to 0, is bounded. If x9 <--- < x, is a partition P of [e, 1], then 


we Foo =fenl Sei I Ol@a SC Ye Gai) 


with x;-1 < & < x;. With e fixed, the right side is a Riemann sum for the bounded 
function x°—! on [e, 1] and is < the corresponding upper sum U (P, x*~! lic 1) As we 
insert points into the partition, the left sides increase and the right sides decrease to the 
limit f! x°-! dx = a7!(1 — £%). Hence V, < Ca7!(1 — e®), and sup, Ve < C/a. 

6. The distribution function F of jz must have F (b) — F(a) equal to 0 or | for alla 
and b. If c is the supremum of the x’s for which there exists y > x with F(x) < F(y), 
then F has to be k on (—oo, c) and k + 1 on [c, +00) for the value of k that makes 
F (0) = 0. Hence wp is a point mass at c with w({c}) = 1. 


7. Let K be compact, and let f and g both be equal to the members of a sequence 
{ fn} of continuous functions of compact support decreasing to the indicator function 
Ix of K. Applying the identity to f, and passing to the limit, we obtain v(K) = 
v(K)*. Thus v(K) is 0 or 1 for each compact set. By regularity v takes on only the 
values 0 and | on Borel sets. Then the argument (but not the statement) of Problem 6 
applies, and there is some c with v equal to a point mass at c with v({c}) = 1. 


8. In (a), if the complement of the set in question is not dense, it omits an open 
set. However, nonempty open sets have positive measure. 

In (b), form Jie [ fret Tg(x-t) dt] d(x). The inner integral equals the Lebesgue 
measure of F for every x since Lebesgue measure is invariant under translations and 
the map ¢ +» —t. Hence the iterated integral is 0. The integral in the other order is 
OS [ Se, n(x —t)du(x)] dt = far Les Trai(x) du(x)] dt = fa: WE +0) dt, 
and Corollary 5.23 shows that jz(E + t) is 0 almost everywhere. 

In (c), the same computation applies, and z(E +1) is 0 almost everywhere. Under 
the assumption that lim,_,9 4(E + ft) exists, the limit must be 0, by (a). 
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9. Write 1/|x| as a sum F) + F, where F; is 1/|x| for |x| < 1 and is 0 for 
|x| > 1. Then te Foo(x — y)du(y) is bounded by y(R), and it is enough to 
handle the contribution from F). For that we have fz; [ fas Fix — y)du(y)]dx = 
Feo [ fas Fie—y) dx] dy) = fos [ fea Fi@) dx] dy) = w(R) fey xh dx, 
and this is finite in R>. Hence the inner integral iP F\(x — y) dQ) is finite almost 
everywhere. 


10. We proceed by induction on n, the case n = | following since finite sets have 


Lebesgue measure 0. Assume the result inn — | variables, and let P(x1,..., Xn) #0 
be given. Let E be the set where P = 0. This is closed, hence Borel measurable 
in R". Fix (x},...,2/,) with P(x},...,x/,) # 0. The polynomial in one variable 


R(x) = P(x},...,X/_,,x) is not identically 0, being nonzero at x = xj, and hence 
it vanishes only finitely often, say for x in the finite set F. Fix x’ ¢ F. Then 
the polynomial Q(x1,...,%n-1) = P(x1,...,%n-1,x’) in n — 1 variables is not 
identically 0, being nonzero at (x; wines 5 eee and its set E, of zeros has measure 0 by 
inductive hypothesis. If m, denotes n-dimensional Lebesgue measure, then Fubini’s 


Theorem applied to Ig gives 
m,(E) = ‘es Mn-1 (Ex) dx = te My—\(Ey) dx! + ‘= My—\ (Ex) dx’. 


On the right side the first term is 0 since the 1-dimensional measure of F is 0, while 
the second term is 0 since the integrand is 0. Thus m(E) = 0. 

HW. Paty) fp ed — ade = fe sst-l ds fp ed — 1! at 
ie [Jo og — ype du| ds = iy lige wis — u)-le-s ds| du = 
fo lip WS tee ds | du = TOF): 

12. In Cartesian coordinates we obtain 1%, hence 1. In spherical coordinates we 
obtain Qy_1 ier tee dr. Putting zr? = s shows that Uae ig eee dr = 
oo (s/) SP es L ds = 4n-NPT(N/2). Hence Qy-1 = 20%/?/T(N/2). 


13. Part (a) is carried out by showing by induction on k that eae = 
1- TES (1 —u;). The case k = n is the desired result. 

In (b), let O < u; < 1 for alli. Then x; > O for all 7, and (a) makes it clear that 
yor) x; < 1. Therefore ¢ carries I into S. Define u = G(x) by the formula in (b). 


If all x; > O and >>", x; < 1, then certainly uj; > 0. Also, ae x; < 1 implies 
xj <1- Se x;, So that uj = xi/(I _ ae x;) < 1. Therefore ¢ carries S into J. 
To complete the proof, we show that G0 g is the identity on J and g 0 @ is the identity 
on S. For G 0 9, we pass from u to x to v. Thus we start with v;, substitute the x’s, 
use the inductive version of (a) to substitute the u’s, and then sort matters out to see 
that v; = u;. For go @, we pass from x to u to y. Then we start with y; and substitute 
the u’s to obtain y; = Chis ad- uj))Ui. To substitute for the u’s in terms of the x’s, 
we use the inductive version of (a) in the form )j=} y) = 1 —[]jz; (1 — w)). This 
gives (Tz d—- ui)) Ui = (1 _ er, y)xi f(A _ 3 x)). Then an induction on i 
shows that y; = x;, and hence @ o @ is the identity on S. 
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In (c), routine computation shows that y’(u) is lower triangular with diagonal 
entries 1, (1 —u 1), (1 —u,)(1 —u2),... , 1 —u1)--- (1 — uy_1), and hence the de- 
terminant is the product of these diagonal entries. Similarly 9’ (x) is lower triangular 
with diagonal entries 1, (1—x ,)~!, 1 —x, —x2)7!,... ,d—xy — x2 — +s — Xp) I, 
and its determinant is the product of these diagonal entries. 


14. The change of variables in Problem 13 gives 


Pott ext dx = fpul 10 = wy)ug]@7}. [0 i) + = tn 
x (l—u)"!.--(1 — uy_1) du 
is f greed = uy )2tte—- OD O—D arr! 


x (1 _ Ug) Bt tan —(—-2)+ (22) 
n—1—1 = a2 
Kee x MT (Lun) ae! du 


: i 1 = 
= iP us Hy yy yetrtan du, 7 te uy Hy Hyg) ot tan dur 


wens fp WO = wy) diya > fo ue dup. 
The right side is the product of 1-dimensional integrals of the kind treated in Prob- 
lem 11. Substitution of the values from that problem leads to the desired result. 


15. The monotonicity makes possible the estimate of uniform convergence, and 
the continuity then makes the limit continuous. A continuous function is determined 
by its values on a dense set, and C° is dense. 


16. Foreachn, F,(x) = 1—F,(1—x). Thus f) F,(x) dx = 1—J) F,(—x) dx = 
1- de F,,(x) dx and fo F(x) dx = 5: Passing to the limit and using uniform or 
1 


dominated convergence, we obtain re F(x)dx = 5. 


18. Use Proposition 6.47. Then u is harmonic by Problem 14 at the end of 
Chapter III. 


19. Since P, has L! norm 1, the inequality |lu(r, YI, < I fll, follows from 
Minkowski’s inequality for integrals. For the limiting behavior as r increases to 1, 
we extend f periodically and write 


u(r,0) — f(O)= + [”, P.(~) fO@ — v) do — f() 
= /" B@If@-9) — f@ldg, 


the second step following since = ‘bee P, dg = 1. Applying Minkowski’s inequality 
for integrals, we obtain 


u(r, -) — fllp < ae Soe P-OIFO — 9) — FO|po 
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since P, > 0. The integration on the right is broken into two sets, S$; = (—46, 6) and 
Sy = [—2, —6] U [5, 7], and the integral is 


IA 


1 fy, P-()(suPyes, IFO —9) — f@ll,o) det + fy, POR, de 
sup |f@ — 9) — fle +2If ll, Sup P.(9)- 


ges} ges 


IA 


Let € > 0 be given. If 4 is sufficiently small, Proposition 6.16 shows that the first 
term is < €. With 6 fixed, we can then choose r close enough to | to make the second 
term < €. 

20. For (a), we argue as in Problem 19, taking S; and S> to be as in that solution. 
Then 


lu(r,0)- fOl<s +s", PISO —-9)-— fOldge 
<x Js, PISO -9) — f@l|de 
+ 5 fy Pr(@DIMF lloo + supper If @)Ildg 
< supyes, If — 9) — f)| 
+ (supyes, Pr(Y))II fF lloo + 8UPpcx IF II, 


and the uniform convergence follows. 

For (b), the Poisson integral of f is of the form )°°° _ cnr!”e'"®, where the c, 
are the Fourier coefficients of f. Any other harmonic function in the disk is of the 
form )°° ci r!"le'"®. Suppose this tends uniformly to f as r increases to 1. Then 
the difference is a series )°° _. d,r'"!e'"® that converges uniformly to 0. Then the 
integral of the product of this series and e~‘** tends to 0. Interchanging integral and 
sum, we see that dr“! tends to 0 for each k. Therefore d, = 0 for each k. 

In (c) since P, is even, 


ind 


[” (P, * f)(0)g) do = [" [", P-(0 — 9) f @)g(0) dy de 
=" [7 PO — 9) f()g(0) d0 dy 
=" [7 Pw — f(y) (0) dé dg, 


and thus {” (P, *« f)(@)g(0)d0 = [™_(P, * )(0)f (0) dO. Therefore 


| (7, (P. * f)(O)g (0) do—f”, f()g (6) d6| = | [™, [CP * g)) — g)] fF) do} 
< 2n||P. *g —glllIflloo- 


By the previous problem the right side tends to 0 as r increases to 1, and the weak-star 
convergence follows. 
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21. Let My and M, be upper bounds for | f| and |g| on [a, b]. Then 


Dei lf ade) — f @i-1)g@i-1)| 
<M @ng@) — fadg@i-n| + 5) Ff @dgQi-1) — fie) g@i-p| 
< My D0; |e@i) — g@i-1)| + My DO IF i) — fF i-DI 
= Mrlgilay + Mell fillay- 


22. Let us rewrite the given equation f(x) = f(a) + gi(x) — g2(x) as 
g2(x) + f(x) — f(a) = gi(x). If x; > x;-1, then subtraction of the values at x = x; 
and at x = x;-1 gives g2(xi) — g2(%i-1) + fi) — f@i-1) = 81Qi) — 81-1). 
If fi) — f@i-1) = 0, then f(x) — f@i-1) < g1(%j) — g1(%j-1) because g2 
is monotone; if f(x;) — f(xi-1) < 0, then 0 < g(x) — g1(x;-1) because g is 
monotone. Therefore (f (xi) _ f(xi-1))* < g1(x;) — gi(@%j-1). Summing on i for 
a partition of [a, x] gives )77_; (fi) — fi)" < gi(x) — g1(a). If we take 
the supremum of the left side and recall that g;(a) > 0, we obtain VT (f)(x) < 
gi(x) — gi(a) < gi (x). Starting similarly from gi (x) — f(x) + f(@) = g2(x) and 
arguing in the same way, we obtain V- (f)(x) < g2(x) — g2(a) < g(x). 

23. Suppose that V*(f) and V~(f) are both discontinuous at some x. Then 
Vt(f)(x~) +e < VT(f)(x*) and V~(f)(x~) + € < V~(f)(x*) for some € > 0. 
Define 


VAY) fory <x, 
giv) = 4 VT(P)@") fory =x, 
VP =e  fory > x; 


and define go(y) similarly except that V~ replaces V*. Then g, and go are both 
nonnegative, and gj — 925 = V*(f) -V-(f) = f — f@). If g; and g» are shown 
to be monotone, Then Problem 22 leads to the contradiction g1(y) < V*(f)(y) for 
y > x, and we conclude that V*(f) and V~ (f) could not have been discontinuous. 

In proving monotonicity for gj, it is necessary to make comparisons only of x with 
other points y. Leth > 0. For points y > x, we have gi(x +h) = Vt(f)(+h)—€ 
>Vt(fyat) —€ = Vt(f)(") = g1(x). For points y < x, we have g1(x —h) = 
Vt(f)(@ —h) < V*(f)(x-) = g1(x). Monotonicity for g2 is proved in the same 
way. 


24. The proof is similar in spirit to the proof of Proposition 6.54. 

25. For f, let y, = (n+ oa, so that f(y) is +(n + 5) tao} if n is even 
and is —(n + 5) in! if n is odd. Compute the sum of the absolute values of the 
difference of values of f at yy, yy_1,-.., 1 and see that this is unbounded as a 
function of N. The function g has a bounded derivative (even though the derivative 
is discontinuous), and this is enough to imply bounded variation. 
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1. If g(a) = g(bx), then a, would have to be in E.. For the second part an example 
is g(x) = x on [0, 1]; there is only one interval (a;, by), and it is (0, 1). 

2. No. Corollary 7.4 applied to Zz shows for almost all x that the quotient 
mEN (x —h,x +h))/m(x —h,x +h)) has to tend to 0 or | as A decreases to 0. 


3. We may work on a bounded interval J. Let € > 0 be given. If x is in E, then 
|h-'(F(x +h) — F(x)| < € whenever |h| < 5, for some 6, depending on x. For 
each such x, fix a positive number r, withr, < zx. Associate the set B(r,; x) to x. 
Then 


M(BOrx; x)) S W(x — Sr, xX + Srx]) = FO + Srx) — FQ — Srx) < 10rxe. 


Applying Wiener’s Covering Lemma, we can find disjoint sets B(r,,; x;) with E C 
Uri B(5r,,; x;). Then 


(oe) (oe) (oe) 


WE) < ¥ 7 W(BOSrs,3 x1) <5) 2s, = Se) (Br: x7) < Sem(). 


i=l i=l i=l 


Since / is fixed and € is arbitrary, w(E) = 0. 


4. If F is the function in question, F — F(O) is the distribution function of 
some Stieltjes measure jz containing no point masses. Proposition 7.8 shows that 
UL(E*) = 0 for some countable set E. Since w({p}) = 0 for each point p, u(E) = 0 
by complete additivity. Thus 4. = 0, and F must be constant. 


5. For (a), the construction shows that F’(x) = 0 for all x € C°. Then Proposition 
7.8 allows us to conclude that yz is singular. 

For (b), let F, be the n™ constructed approximation to F (using straight-line 
interpolations), and let f, be its derivative (defined except on a finite set and put 
equal to 0 there). The function f,, is a multiple c, of the indicator function of the 
subset C,, of [0, 1] that remains after the first n steps of the construction, and also 
m(Cy) = []z—1 (1 — rx). Since F, (x) = fh fn(t) dt for all x, we have 1 = F,(1) = 


Cn Ie Ic, (t) dt = cn [Ixy (1 — rg). Therefore f, = (TT (1 —ry))'To,- Put f = 
P-'Ic. The functions f,, converge pointwise to f, and they are uniformly bounded 
by the constant function P~!. By dominated convergence, F(x) = Lhe f(t) dt for 
0 <x <1. Therefore F is the distribution function of the measure f(t) dt. 

6. Let E be the second described set. The complement of EF has measure 0 by 
Corollary 7.4. Fix x in E, and let € > O be given. Choose a rational r such that 
Ir — f(x)| < €. Forh > 0, 


x+h 


A>" FQ) — fadldt <A! ff" | Ff) —rldt tar! fi" |r — fx) dt. 


The second term on the right side equals |r — f(x)| < €, and the first term tends to 
| f(x) —r)| < € since x is in E. A similar argument applies if h <0. 
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7. Part (a) is routine, and part (b) follows by adapting part of the argument for 
Theorem 6.48. In (c), the assumption that x is in the Lebesgue set implies that 
Sik [f(x —t) — f(@)|dt < he,(h) for h > 0, where c,(-) is a function that 
tends to 0 as h decreases to 0. For each of the described pieces of the integral 
‘es K,(@®|f@ — t) — f(x)|dt, we use one of the two estimates in (a), specifi- 
cally the estimate Ky(t) < N +1 for the piece with |t] < 1/N and the estimate 
Ky(t) < c/(Nt?) for all the other pieces. The piece for 1/N then contributes 
< (N+1) Sicayn [f(x — t) — f(x)|dt < 2c,(1/N), the piece for 2''/N < 
It] < 2*/N contributes < § (2°°'/N)* fuajyeyeoryy If — 1) — f@)ldt < 
< (2k 1/N)-?(2*/N)cy(2k/N) = 4- 27*c,(2*/N), and finally the piece for 
NOUS < |e) Se -contributes “= EN [acy FOO = 1) = F@)lae = 
7 N+P Or (fll, + |f(x)|). The sum of the estimates is 


[N34] 

<2e,(1/N) + D> 4-2-*e,(2*/N) + 2eN7'P( fly HLF DD 
k=1 

<4 sup c(h)+eN 7 (Fl, +1f@D, 


O<h<N-1/4 


and this tends to 0 as A decreases to 0. (The use of the shells with 2~* is a device 
that appears frequently in Zygmund’s Trigonometric Series and may be regarded as 
a kind of manual integration by parts.) 


8. Since yw is singular, find a Borel set E with w(E) = 0 and m(E°) = 0. Let 
€ > 0 be given. By regularity of m + ju, choose an open set U containing E such 
that (m+ w)(U — E) < €. Then w(U) < WU — E)+ w(E) = wU — E) < €, and 
mU*) < m(E‘) =0. 

9. About each x in U, there is some 5(x) such that (x —h,x +h) C U for 
h < d(x). Then v((x —h, x +h)) = 0 for h < 5(x), and the limit of this is 0 as h 
decreases to 0. 

11. Since U is open and fz2(U) = 0, Problem 9 gives 


lim (2h)! wa((x —h, x +h)) =0 
1 
for all x in U. Since m(U‘) = 0, limp jo (2h)! w2((x —h,x +h)) = 0 for almost 
every x inR!. The measure j1; has 2; (IR!) = w(U) < €, and Problem 10 shows that 
m{x | lim sup wi((x —h, x +h)) > &} 
hyo 


<m{x| suppi((x— b,x +h) > §} < Sui B/E <5e/é. 


12. Itis enough to handle the case that jz vanishes outside some interval and hence 
has (R!) finite. Combining the estimates for 41 and j12 gives 


m{x | limsup u((x —h,x +h)) > &} <5e/é. 
h{o 
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Since € is arbitrary, m{x | lim SUP), \0 M(x —h,x +h)) > é} = 0. Taking the union 
for € = 1/n, we conclude that the set where lim SUP), 0 U(x —h,x +h)) > 0 has 
measure 0. 

To get the better conclusion, the main step is to obtain a bound 10¢€/& for the 
maximal function formed from the supremum of v((x, x +h)) or v((x —h, x)). The 
proof of Corollary 6.40 shows how to derive this from Problem 10. 


Chapter VIIT 


1. Let F be the Fourier transform as defined in the text. In each part of the 
problem, a can be computed by relating matters to the known facts about F, and 6 
can be computed directly from the definitions and Fubini’s Theorem. 

In (a), we have f(y)=f f(x)e*? dy= ff (xje F012) dy =F f (y/(2m)). 
To obtain f(x) = a f fie? dy, we want f(x) = a f Ff (y/(2m))e*? dy = 
(21)%a f Ff(y eX C™ dy’ = 27)Naf (x). With fegQx) = Bf fx—Ng(ydt, 
we have fxely) = B Sf f(x—te(the*? dt dx = B hf f(x—tye(the*? dx dt = 
BSS f@g@Me tet)” dx dt = BFOye). Thus a = (27)~ and B = 1. 

In (b), we find similarly that fy) = (2n)-“Ff(y/(2m)), and we are led to 
(27)% (22)-Na = 1. Sow = 1. Also, B(27)" = (27)?% and B = (2)%. 

In (c), we find similarly that « = (27)~%/? and 6 = (277)*/*. This normalization 
has the property that a and £ are both 1 if dx is replaced by dx /(27)%/* throughout. 

2. This is an operation called “polarization” in linear algebra, and it will be 
explained further in Chapter XII. Application of the Plancherel formula to f + cg, 
f,and cg gives || f + cgll} = IFC) + cF(eIl5, WAS = WACAIG, and lleglZ = 
IIcF(g)II5. We expand the first one in terms of the inner product and subtract the 
other two to obtain 


(Ff, €8)2 + (c8, Po = (FP), cF(8))2 + (CF(8), FP) 2- 


Then c(f, 8). + c(f, 8). = C(F(f), F(8))o + Ff), F(g))2. Taking c = 1 
gives 2Re(f, g), = 2Re(F(f), F(g)), whereas taking c = i gives 2Im(f, g), = 
2Im(F(f), F(g)).. The result follows. 


3. For any f in i. we have Q, * (OQ. * f) = P:+- * f because the Fourier 
transforms are equal. Also, (Q, * QO.) * f = Q,*(Q,' * f) since we have finiteness 
when the functions are replaced by their absolute values. Moreover, the functions 
Q.*Q, and P,,, are bounded and continuous. Letting f run through an approximate 
identity formed with respect to dilations and applying Theorem 6.20c, we see that 
Q, * Oe(x) = Pete (x) for all x. 

4. Since P, is even, fon (P:*f)(x)g(x) dx = fon fan Pia—y) f (g(x) dy dx = 
Jew San Px — y) fax) dx dy = fon fan Ply — x) f(y) g(x) dx dy, and thus 
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Jpv (Pr ® f(x) (x) dx = fon (P; * g)(x) f (x) dx. Therefore 


| Jaw (Pr PB (a) dx— fan Fe) g(x) dx] = | fav (CP: * 0) — 8) ] f @) de| 
S ||Pe* 8 — gllillflloo- 


By Theorem 8.19c the right side tends to 0 as ¢ decreases to 0, and (a) follows. 
For (b), part (a) shows for each g with ||g||; < 1 that dias f (x)g(x) dx| = 


lim, yo | fen (Pr * f) (x) g(x) dx]. Since | fon Pr* f(x) g(x) dx| < IP; * flloollgllh, 
||P; * F ilocos we have 


IA 


| few f &)g (x) dx| < liminf; jo ||P; * f lloc 


whenever ||g||, < 1. For any € > 0 with || f|l,, —¢ > 0,let S. be the set where | f| is 
> IIS ll, —€. Thenm(S.) > 0. Take E to be any subset of S, with0 < m(E) < +00, 
and let g(x) be mE)! f(x)/|f(x)| on E and zero elsewhere. This function has 
lig, <1. Then | few fedx| = few fgdx = mE)! f,\fldx = fille — € 
Hence || f ||, —€ < leg fg dx| < lim inf;j0 ||P; * f||,,. Since € is arbitrary, || f ||, 
lim inf, 19 ||P; * f||,,- On the other hand, Theorem 8.19b shows that ||P, * fll, 
If Ilo. SO we have ll fll, <liminf,jo ||P; * fille < lim sup,9 IP * filles < If leo: 
Equality must hold throughout, and (b) is thereby proved. 


5 
IA IA 


5. In (a), the set function is a measure by Corollary 5.27. It has w(IR™) equal to 
1 (R®)22(R) and is therefore a Borel measure. If 4; = f dx and 12 = ps, then 


(f * WE) = Jaw (f dx) (EB — x) due) = fw Sey FO) dy dua) 
= fan Jan Tex) f (y) dy du(x) = fow fan Tex + y) f(y) dy due) 
= fow Sav le) fy — x) dy du(x) = fan fe fy — x) dydu(x) 
= fel Sew FO — x) du(a)] dy. 


In (b), we start with an indicator function and compute that 


Sen Sgn Lex + y) dur(x) duly) = fon | fon Le-y) dui (x)] du2(y) 
= fev Mi (E — y) duo(y) 
= (w1 * M2)(E) = fn Le d(u1 * U2). 


Then we pass to simple functions > 0, use monotone convergence, and finally take 
linear combinations to get fo fen g(x + y) dui(x) du2(y) = fen gd (HI * M2). 
In (c), we actually have ||P; * ||, = y(R") for every t > 0 by Fubini’s Theorem. 
Part (d) is handled in the same way as Problem 4a. First one shows that 
Jen (Pr ® W(x) g(x) dx = fon (P; * 8)(x) du(x) for g in Coom(R). The resulting 
estimate is | fay [(P: * g)(x) — g(@@)]du(x)| < ||P: * 8 — 8llgup H(R™), and then (a) 
follows from Theorem 8.19d. 
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6. Part (a) follows from the same argument as for Proposition 8.la. In (b), the 
measure 6 with 6({0}) = 1 and 6(R*% — {0}) = 0 has 5(y) = | for all y. In (c), we 
use the result of Problem Sb with g(x) = e~77"*" and get f e777" d(juy * 2)(x) = 
Lf oP" dui (x) day) = fi (1722). In (A), let p(x) = Pi (x). Then fi = 0 
implies @, * = 0 for every ¢ > 0. Since g, * py is a function, Corollary 8.5 gives 
@- * LL = 0 for every ¢ > 0. By Problem 5d, g, * 4 converges weak-star to pu against 
Ccom(R”). Therefore ton gd = 0 for every g in Ccom(R”), and Corollary 6.3 
shows that ~ = 0. 

7. This is the same kind of approximation argument as was done in Corollary 6.17. 

8. We calculate that )7; | MQ — xe = Dj fet G—DEE du(t) = 
f (Xie) EPG) du = f | Lj emg)? du = 0. 

9. For the set {0}, the condition is that F (0)|& |? > 0 for all €;; thus F(O) > 0. For 
the set {x, 0}, the condition is that F (0)|£; |?-+ F (x)€1€+ F (—x) 21+ F (0) |&|? > 0. 
Taking | = & = | shows that F(x) + F(—x) is real; taking €; = i and & = | 
shows that i(F (x) — F(—x)) is real. Therefore F(x) + F(—x) = F(x) + F(—x) 
and F(x) — F(—x) = —F(x)+ F(—x). Adding we obtain F(—x) = F(x). Hence 
—F (x)&i& — FQ) Fie < FO)(El? + [&22). If F(x) #0, we put &| = —1 and 
& = F(x)/|F(x)| and obtain |F(x)| < F(O). 

10. bate, F(x; —x)® (xj —x)&§ = Pay, Fe F(x; = xe Oi) OEE; dt = 
f ae F(x; — xj) (Ee 27%) (Ee) |) dt > 0. 

11. Part (a) follows from the boundedness of F obtained in Problem 9. 

In (b), every g in Ccom(RY) satisfies 0 < ff Fox — y)g@)g(y)dxdy = 
[Fox a) (xg@=f Fox 38) dy=f Fo@O)FO) dy=f FoOl@OI? ay. 

For (c), if f is in L*, we can approximate f as closely as we like by a 
member g of Coom(RY). Then fol@l? = folF(f)I? + 2foRe(FU@ —F))+ 
fol@-F(f)|?. We integrate and use the resulting formula to compare f folgl’ dy with 
I folF(f) |? dy. By the Schwarz inequality and the Plancherel formula, the absolute 
value of the difference of these is < 2| follsupll fllallg — fllz + Wl follsupllg — fils. 
Since f folg|? dy is > 0, it follows that [ folF(f)|? dy = 0 for all f in L?. Since 
F(f) is an arbitrary L? function and fo is continuous, we conclude that fo is > 0. 

The integrability in (d) is immediate from Lemma 8.7, and the formula {fo dy = 
F(Q) follows from the Fourier inversion formula. 


12. Let ¢, be a sequence decreasing to 0, let ® in Problem 11 be the function 
e-7&X!" and write F, for the function F®. Then Problem 11d shows that 1, = 
F,(y) dy is a finite Borel measure with ,(R”) = F,(0) = F(0). The Helly—Bray 
Theorem applies and produces a subsequence of {j/z,} convergent to a finite Borel 
measure j4 weak-star against Ceom(R™). We shall prove that F(x) = if gonre du(y), 
i.e., that v with v(E) = y(—E) is the desired measure. 7 

For each n, the Fourier inversion formula gives F,,(x) = f e2tixy Fr (y)dy = 
f. e7'x-Y dit,(y). Since F,(x) > F(x) pointwise, the result would follow if we could 
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say that the weak-star convergence implies that [ e779 dinn(y) > if CPF? dG): 
However, e”*'*” is not compactly supported, and an additional argument is needed. 

First we extend the weak-star convergence so that it applies to continuous functions 
vanishing at infinity. If f is sucha function, we can find a sequence { f;} in Ccom (RY) 
converging to f uniformly. Then 


If fdun—f fdu| 
<|ffdun-f fedun|+|S fedtn—-f frdul+ |S fedu-f f dul 
< fe — f llsup Hn RY) + | f fedun—f fedu| + Ife — fllsup HR). 


Choose k to make || f, — f ll sup small. With k fixed, choose n to make the middle 
term small. Then the right side is small since the numbers jz, (R™) are bounded. 

This is not quite good enough by itself because e?”'*” does not vanish at infinity. 
However, averages of it by L! functions (i.e., Fourier transforms of L! functions) 
vanish at infinity, and that will be enough for us. 

Define F#(x) = fe?"'*” du(y). We prove that F*(x) = F(x) for all x. It 
is enough to prove that f F*yrdx = f Fwdx for all y in L'. Define yY(y) = 
if e2™ XY yy (x) dx. The multiplication formula (for (-)Y instead of (-)~) and the 
Riemann—Lebesgue Lemma give 


[Feb dx = fw’ duG) =lim, f VY dun = lim, f vY F, dy 
=lim, f WF,’ dy = lim, f WFn dy. 


The right side equals f w F dy by dominated convergence since | F;,(y)| < |F(y)| for 
all y. 

13. Part (a) is easy. 

In (b), if x is a character, then >, x(x) = D2, x(gx) = x(g) >>, x). Thus 
>, X(x) = 0 if there is some g with x(g) 4 1, ie., if x is not trivial. If x and x’ 
are distinct characters, then x x’ is not trivial, and therefore Xx (x)x/(x) = 0. The 
orthogonality implies the linear independence. 

In (c), the element | of J, has order m under the group operation of addition. 
Thus each character x of J, must have x (1) equal to an m"™ root of unity. Since 1 
generates J,,, x (1) determines x. Thus the listed characters are the only ones. 

In (d), any tuple (nj, ...,n,) withO <n; < mj; for 1 < j <r defines a character 
by (ky, ...,k,)) TTj=1 (Gn There are j=: m, distinct characters in this list, and 
they are linearly independent by (b). Since dim L?(G) = TTj=1 m,, these characters 
form a vector-space basis. 

14. Since the characters form a basis of L?(G) as a consequence of Problem 13d, 
we have f(t) = iy Cy'x'(t) for some constants c,,. Multiply by x(t) and sum over 


t to get FOO = ae », Cx X(t) x (0). The orthogonality in Problem 13b shows that 
this equation simplifies to f(x) = cy, >-, Ix(@|? = |G|cy. 
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a F(X) = Deg FOXKO = Vice yy Crew LO+FMAXO = Vig iy FOXO 
= F(x). 

16. The characters of G are the ones with x,(1) = ¢" forO < n <m. Sucha 
character is trivial on H if and only if x,(q) = 1, ie., if and only if G7 = 1; this 
means that ng is a multiple of m, hence that n is a multiple of p. 

The element | of H is the element g of G. Thus the question about the identification 
of the descended characters asks the value of x,,(1) when n is a multiple jp of p. The 
value is x,(1) =o" = a a ee 

If we have computed F on G/H and want to compute F from the definition of 
Fourier transform, we have to multiply each of the g values of F by the values of 
each of the g characters of G/H and then add. The number of multiplications is q”. 
The actual computation of F from f involves p additions for each of the g values of 
t, hence pq additions. 

17. FG) = 3 FOG = PF Oc!) a?.. The variant of f for 
the number k is thenite f(i ee, Handling each value of k involves m = pq steps 
to compute the variant of f and then the g* + pq steps of Problem 16. Thus we have 
q° + 2pq steps for each k, which we regard as on the order of g* + pq. This means 
p(q? + pq) steps when all k’s are counted, hence pq(p + q) steps. 
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1. Letr = q/p,and let r’ be the dual index. Regard | f |? as a product | f|? - 1, and 
apply Holder’s inequality with | f|? to be raised to the r power and | to be raised to 
the r’ power. Compare with Problem 3 below, which is a more complicated version 
of the same thing. 


2. The inequality is routine if any of the indices is 00. Otherwise, we have 


Sl fghidu < (Sl fg dp)” (far dp)” 
< (fare apy)” ((fdgl")/" dp) 2)” Wal, 
= If lplgly lll, 


3. Let us say that || fnll, < C. Let e > 0 be given. By Egoroff’s Theorem, find 
E with (£) < € such that f, tends to f uniformly on E°. Application of Hélder’s 
inequality with the exponentr = p/q and dual index r’ = p/(p—q) to he lfnl?- ld 


‘ 1 —q) = 
gives ||frlella < (fe lilt? du) Pf 1dp)” DIPD < Cy(E)P-9IPD < 
Ce?—-9/(PD) | Meanwhile, we have 
Ifa — Fill < fn — Sat eelly + W fate — fleclly + Wf lee — filg 
= Wfolally + fa — fLerlly + If Lell- 
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The first term on the right is < Ce'?—/?, and so is the third term, by Fatou’s 
Lemma. The middle term tends to 0 as n tends to infinity because of the uniform 
convergence. Thus limsup, || fn — fllq < 2Ce?~?/??. Since € is arbitrary, 


lim sup, Il fn — fllq = 9- 

4. L! is 0, and L™ consists of the constant functions. All the constant functions 
give the same linear functional on L! because the integral of the product of any 
constant function and the 0 function is 0. 

5. Put P’ = {f(x) > 0}, N’ = {f() < O}, and Z’ = { f(x) = 0}. If E is any 
measurable subset of Z’, then X = PUN with P = P'UEandN = N'U(Z'— E)is 
a Hahn decomposition. All other Hahn decompositions are obtained by adjusting P 
and N by taking the symmetric difference of P and of N with any set of jz measure 0. 

6. In (a), let X be the positive integers, and let the algebra consist of all finite 
subsets and their complements; let v of a finite set be the number of elements in the 
set, and let v of the complement of a finite set F be —v(F). In (b), use the same 
X and algebra, define v({2k}) = 2-* and v({2k — 1}) = —2-*, and extend v to be 
completely additive. In (c), let X = [0, 1], let the o-algebra consist of the Borel sets, 
and take v to be Lebesgue measure and jz to be counting measure. 

7. Since P, has L! norm 1, the inequality ||u(r, I, < Ifll, follows from 
Minkowski’s inequality for integrals. For the limiting behavior as r increases to 1, 
we extend f periodically and write 


u(r.) — fO)=+ [7 P.@)f@—)dy— f@) 
+f" P.(y)Lf 6 — 9) — F@ld¢, 


the second step following since = ie. P, dg = 1. Applying Minkowski’s inequality 
for integrals, we obtain 


lar, -)— fllp < ae LG POISE - 9) -— FOllpe 


since P, > 0. The integration on the right is broken into two sets, $; = (—46, 6) and 
Sy = [—2, —6] U [4, 2], and the integral is 


+ fi, Pr(9)(supyes, If — 9) — FO|p0) 9+ + Sy, PIF lly dv 


< sup ||f@— 9) -— FMll,6 +21 fll, sup P-(¢). 
ges} (TRY) 


A 


Let € > 0 be given. If 4 is sufficiently small, Proposition 9.11 shows that the first 
term is < €. With 6 fixed, we can then choose r close enough to | to make the second 
term < €. 

8. Let p be the dual index to p’. Put r/R =r’ in Problem 13 at the end of 
Chapter IV, so that 


u(r'R,0) = x J”, fr(y) Pv (6 — g) dy 
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forr’ < 1. Take a sequence of R’s increasing to 1, and let {R,} be a subsequence such 
that { fr,} converges weak-star in L” relative to L”. Let the limit be f. For each 6 
andr’, P,,(@— -)isin L?, and the equality u(r’R,, 0) = = tae Fr, (G) Pr (0@-g) de 
thus gives u(r’, 0) = = fe Ft (gy) P-(@ — ~) dg, which is the desired result. 

9. If v is a measure with O < v < y, then v({n}) = 0 for every n, and hence 
v({integers}) = 0. Sov = 0. 

10. Let yz be given on the space X,, and consider the set S of all completely additive 
v with 0 < v < yw. This contains 0 and hence is nonempty. Order S by saying that 
Vy < wif vj(E) < v2(E) for all E. If we are given a chain {vg}, let C = sup, Vo(X). 
This is < p(X) and hence is finite. Choose a sequence {v,,} from the chain with 
Vy, (X) monotone increasing with limit C. 

If m <n, let us see that vy, < vy,. Since the v,’s form a chain, the only way 
this can fail is to have vy, (EZ) > vy,(£) for some EF and also vy, (E°) > vo, (E*). 
But then vy, (X) > ve, (X) by additivity, and this contradicts the fact that vy, (X) is 
monotone increasing. Som <n implies vy, < Vo, . 

Define vo(E) = lim, vg, (EF). Corollary 1.14 shows that vo is completely additive, 
and certainly vo < jz. So vo is an upper bound for the chain. Zorn’s Lemma therefore 
shows that S has a maximal element v. 

Write o = ~—v. This is bounded nonnegative additive as a result of the 
construction. If there were a completely additive 4 such thatO < A < o, then 
v + A would contradict the construction of v from Zorn’s Lemma. Thus o is purely 
finitely additive. 


11. It is enough to prove that jz is completely additive. If the contrary is the case, 
then there exists an increasing sequence of sets E, with union E in the algebra such 
that the monotone increasing sequence {u(E,,)} does not have limit w(E). Since 
is nonnegative additive, u(E,,) < “(E) for all n. Thus lim, w(E,) < wCE). Since 
v — 2 1s nonnegative additive, v — yz similarly has lim,(v — “)(E,) < (v — “)(E). 
Adding, we obtain lim, v(E,,) < v(E), in contradiction to the complete additivity 
of v. 


12. Suppose jz is nonnegative bounded additive. Let uw = vy + Py = V2 + P2 with 
vy and v2 nonnegative completely additive and with ; and 2 nonnegative purely 
finitely additive. Then vj — v2 = 02 — p). Let vt — v7 be the Jordan decomposition 
of vj — v2. Since vy — v2 is completely additive, so are vt and v-. The equality 
vt — v7 = p2 — pe; and the minimality of the Jordan decomposition together imply 
that 0 < vt < py and0 < v~ < py. Problem 11 then shows that vt = v- = 0. 
Hence vj — v2 = 0, vy = v2, and pj = 2. 

13. Let R = I x J be centered at (x,y). Then ame |f(u, v)|dvdu = 


an Jr lan S,\f@, v)|dvjdu < aan Ji fitu,y)du = fo(x,y). Taking the 
supremum over R gives f**(x, y) < fo(x, y). 


14. ff | f(x, y)\Pdxdy < ff \frlx, y)|Paxdy = f[ fl fo, yl? dx]dy < 
AD Sf [fl fi, y)|? dx] dy by Corollary 9.21. If we interchange integrals and apply 
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Corollary 9.21 a second time, we see that this is < A>” S [SFG yl dy|dx = 
AZP P 
p Ilfllp- 
15. This is done in the style of Corollary 6.39. 
16. Let ©; > 0 be a decreasing C! function on [0, 1] with (0) = 0,6; (1) = 1, 
and &/ (1) = —1. Define ®o(x) on [0, 1] to be 1 (x)/(r( + x7)) on [0, 1] and to 
be 1/(ax( +x7)) on [1, +00). Then ®(x) = ®o(|x|) has the required property. 


17. sup.co |e * FI)! < sup,o(Wel * |FIG) < sup.co(®s * | f[)(x), and 
then sup,.9 |(We * f)(x)| < Cf*(x) by Corollary 6.42. Since ni w(x) dx = 0, the 
last part of the proof of Corollary 6.42 shows that lim,.o(w, * f)(x) = Oae. for f 
in L'(R!). If f is in L®(R!) and a bounded interval is specified, we can write f as 
the sum of an L! function carried on that interval and an L® function vanishing on 
that interval. The L! part is handled by the previous case, and the L™ part is handled 
on that bounded interval by Theorem 6.20c. 


18. We use the fact that OQ; = h, + W,, where y is integrable with integral 0. 
Since h, * f and wy, * f are in L?, so is Q, * f. Convolution by an L! function 
such as P, is continuous on L? by Proposition 9.10. With all limits being taken in 
L? as &’ | 0, we have P; * (Hf) = Pe * (lim(he * f)) = lim P, * (hy * f) = 
lim P, * (Qe * f — We * f) = lim P, * (Qz * f) — (im P, * We’) * f. The second 
term on the right side is 0. If we think of P, as in L! and Q,. asin L’, then we have 
P. * (Qe * f) = (P, * On) * f = Oere* f= (Pe * Oc) * f = Py * (QO. * f). 
Thus lim P; * (Q, * f) = lim Py * (Q, * f) = Q- * f, and we conclude that 
P, * (Hf) = Qe * f. 

19. supcgl(te * AIG) < suppco (Qe * AICO! + sup.co (We * AG! < 
sup,.9 |(P. * Hf)(x)| + Cf*(x) < C'(Hf)*(x) + Cf*(), the last inequality 
following from Corollary 6.42 for P,. Let 1 < p < oo. Then it follows from 
Corollary 9.21 that || sup,.o [Ae * fll, < Cp(lHSIl, + Il fll,), and we conclude 
from Theorem 9.23c that || sup,.9 |Ae * f| ll, < Doll f|l,- Lemma 9.24 shows that 
limejo(he * f)(x) = f(x) everywhere if f is in a certain dense subspace of L?, and 
it follows as in Problem 15 that lim,jo(h. * f)(x) = f(x) almost everywhere if f is 
arbitrary in L?. 

20. Imitating the proof of parts (a) and (b) of Fejér’s Theorem (Theorem 6.48), 
we readily prove that K, « f — f in L’, where K,, is the Fejér kernel. Therefore 
finite linear combinations of the exponentials are dense in L?({—z, 2]). For each 
such linear combination f of exponentials, we have S, f = f for all sufficiently large 
n, and hence S, f — f in L? for a dense subset of L’”. Using the given estimate 
on |S, f|| ‘i and the convergence of S, f on the dense set, we argue as in the proof of 
Theorem 9.23b to deduce convergence for all f in L”. 


21. Let F,(t) = for 0 < |t| < z, and extend F, periodically. 
Then 4 F,,(t) = sin(n + 5)¢ = (sin 5t)D,(t). Since (t/2)/sin3t = 1+ tW() 
with w(t) bounded above and below by positive constants on [—7, 2], we see that 


2sin(nt+5)t 
t 


622 Hints for Solutions of Problems 


D,(t) — Fit) = [+ _ Fi) = 2w(t)sin(n + 5)t. Then the functions 
3 
Walt) = 2 (t) sin(n + 5)t have D, — Fy = Wy and ||Wn||, bounded. By inspection, 


f . 2si 7 
F,,—E,, equals the function that is ae 


for |t| < x4, andisOfor x4, < |t| <x. 
These functions are < 2(n + 5) for |t| < st and are 0 otherwise; so their L! norms 
are bounded. This proves that D, — E, = @, with ||g, ||, < C for some C. 

If |Tn fll < Bpllfll,,»then we have ||Sp fly = IDn*fllp = lEn*f+Gn¥* fly < 


En * fll, + Gn * fll, < Boll fll, + ll@nll Fill, and we can take Ap = Bp +C. 
22. We have 2i sin(n + 5)t = eilm+3) _ e+)" Thus the effect of the operator 
T, on f is the sum of two terms TY f + T° f, one of which is 


~if(a— te t+ ZIAD gin z)x 
dt. 


IOFG)= |, 


t 
mar SItls* 


If we regard f as continued periodically to the interval [—3zr, 377] and we put f equal 
to 0 outside that interval, then 


T) fe) =e! D* (Hy — Hyongy)g)(x) forx € [-2, 7], 


where g(y) = ~inf (ye i@t VO) on [—3z, 32]. With A, as the constant from 
Theorem 9.23, Theorem 9.23 gives 


< (fe ITO FO)? dx)!” 
< (Se |Hz g|? dx)'!” a (te lA ental? dx)'/? 
<2Ap( fe lg|? dx)” < 20 A, (3, {7 |fl? dx)”. 


(7, IT f(xy? dx)? 


We get a similar estimate for re f and the desired estimate for T,, f follows. 

23. Define a signed measure v on B by v(B) = f f dw. Then v is absolutely 
continuous with respect to the restriction of jz to B, and the Radon—Nikodym Theorem 
yields a function g measurable with respect to B such that v(B) = [ 2&4 for all B 
in B. This function g is E[f|6]. Uniqueness is built into the uniqueness aspect of 
the Radon—Nikodym Theorem. 

24. For those n’s such that u(X,) 4 0, E[f|B] may be defined to be equal 
everywhere on X,, to the constant (X,,)~! f. y, f du. For definiteness, E[ f|B] may 
be defined to be 0 on each X, with w(X,) = 0. 

25. The function f satisfies the defining properties (i) and (ii) of E[f|.A]. 

26. In (a), we identify E[E[f |] | C] as EL f|C]. Itis measurable with respect to C 
and hence satisfies (i) toward being E[f|C]. Any C € Chas Je E[E[f|B] | C]du = 
J. El f|Bl du. In turn this equals ff du since C is in B. Hence E[E[f |B] | C] 
satisfies (i1) toward being E[f|C]. 
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In (b), we identify E[ f|5]+ E[g|B] as E[ f+g | 6]. Itis measurable with respect 
to B and hence satisfies (i). For (ii), each B in B has JR(ELF IB] + E[g|B)) du = 
J, Elf\Bldut J, ElgiBldu =f, fdut+ f,gdu=f,(f +adu. 

In (c), itis enough to handle f > 0, and then it is enough to handle g > 0. If g = Ig 
with B € B, then we shall identify 7, E[ f|6] as E[ fl, | B). Certainly Jp E[f |B] 
satisfies (i). For (ii), each B’ in B has fy, IgE[f\Bldu = fang El fl|Bldu = 
Jeng f du = J Inf du. This handles g equal to an indicator function. Part (b) 
allows us to handle g equal to a simple function, and monotone convergence allows 
us to handle g equal to any nonnegative integrable function. (For this last conclusion 
one needs to use that f > Oimplies E[f|B] > 0, but this is built into the construction 
via the Radon—Nikodym Theorem.) 

In (d), the important thing is that X is a set in 6. Then (ii) and (c) successively 
give Jy fElgiBldu = fy ELfElg|B) | Bldu = fy ELfIBJELg|B]dy.. The right 
side is symmetric in f and g, and hence the left side is also. 


Chapter X 


1. For (a), the diagonal A = {(y, y) € Y x Y} is aclosed subset of Y x Y since 
Y is Hausdorff, and the function F : X — Y x Y given by F(x) = (f(x%), g(x)) is 
continuous. Therefore F~!(A) is closed. 


2. The argument is the same as for Problem 18 in Chapter II. 


3. We argue as in the proof of Theorem 2.53. Taking complements, we see that it is 
enough to prove that the intersection of countably many open dense sets is nonempty. 
Suppose that U,, is open and dense for n > 1. Let x; be in U;. Since U, is open, 
local compactness and regularity together allow us to find an open neighborhood B, 
of x; with Ba compact and BA C U,. We construct inductively points x, and open 
neighborhoods B,, of them such that B, C U;N---NU,, and jh C B,_1. Suppose B,, 
with n > 1 has been constructed. Since U,,, is dense and B,, is nonempty and open, 
Un41 1B, is not empty. Let x,,; be a point in U,4; 9 B,. Since U,4; 9 B, is open, 
we can find an open neighborhood B,,.1 of x,41 in U,+1 such that Bos SC Un 1 NBy. 
Then B,41 has the required properties, and the inductive construction is complete. 
The sets Be have the finite-intersection property, and they are closed subsets of B*, 
which is compact. By Proposition 10.11 their intersection is nonempty. Let x be in 
the intersection. For any integer N, the inequality n > N implies that x, is in By+. 


Thus x is in Be Cc By CU{N-+-NUy. Since N is arbitrary, x is in (2; Un. 


4. Let Y be a locally compact dense subset of the Hausdorff space X. If y is in 
Y, let N be arelatively open neighborhood of y such that N C K with K compact in 
Y. Since N is relatively open, N = UY for some open U in X. It will be proved 
that N = U, so that each point of Y has an X open neighborhood, and then Y will 
be open. The set K is compact in X and must be closed since X is Hausdorff. The 
points of UM K arein Y since K C Y,andhence UM K CUNY =N. Consider 
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a point x of the open set U — K. Suppose x is not in Y. Then x is a limit point of Y 
since Y is dense. Hence the open neighborhood U — K of y contains a point y’ of Y. 
Then y’ isin UM Y = N C K and cannot be in U — K,, contradiction. We conclude 
thatx isin Y. ThenxismUNY=N,andU=N. 


5. First consider any continuous function f : Y* — [0, 1] with f(v.) = 0. The 
set of y’s with f(y) > 1/k is open and contains y,,, thus is a compact subset of Y 
and must be finite. Hence the set of y’s with f(y) = 0 has a countable complement. 

If Z is normal, apply Urysohn’s Lemma to A and B, obtaining a continuous 
F:Z — [0,1] with f(A) = 1 and f(B) = 0. Enumerate the members of X as 
X1,%2,.... For fixedn, f(y) = Fp, y) is continuous from Y* to [0, 1] and is 0 at 
Yoo. Thus F (x,, y) > 0 only ona countable set S, of y’s, and F(x,, y) > 0 for some 
n at most on the countable set S = bees S,. If yo is not in S, then x F(x, yo) 
is continuous from X* to [0, 1], is O for every x other than x, and is 1 at x4. This 
contradicts the continuity, and we conclude that Z is not normal. 


6. If E is an infinite set with no limit point, then E is closed and each x in E is 
relatively open. Hence each x has an open set U, in X with U, 1 E = {x}. These 
open sets and E° cover X, and there is no finite subcover. Thus X compact implies 
that each infinite subset has a limit point. 


8. Part (a) follows from Problem 7b and Proposition 10.34. For (b), f —l(—o00, a) 
is @ifa < 0,is R — {0} if0 < a < 1, and is R if a > 1; hence it is open in 
every case. Part (d) follows from (a). For (e), there exists an upper semicontinuous 
function > f(x), namely the constant function everywhere equal to sup | f (x)|. Then 
(d) shows that the pointwise infimum over all upper semicontinuous functions > f (x) 
meets the conditions on f~. 


9. For (a), we have O-(x) = f(x) + (—f) (x). Both terms on the right are 
upper semicontinuous, and the sum is upper semicontinuous by Problem 8c. For (c), 
f-) < FQ) < fo) = Of(x) + f_(). If Oy =0, then f_ = f = f~ shows 
that f is continuous with respect to all sets {x < b} and all sets {x > a}. Hence 
f~'(a, b) is open for every a and b, and f is continuous with respect to the metric 
topology. Conversely if f is continuous, then the definition makes f~— = f and 
(-—f) =-—f. Therefore f~— = f_ = f and OQ = f— — f_=0. 

10. In (a), that subset of pairs is (A x A) U(B x B) U {(x, x) | x € X}, which is 
the union of three closed sets and hence is closed. In (b), let X be a Hausdorff space 
that is not normal, and take A and B to be disjoint closed sets that cannot be separated 
by open sets. 


11. In (a), q-'q(x) = p2(({x} x X) MR), where p2 is the projection to the 
second coordinate of X x X. Since {x} is closed and X is compact and R is closed, 
({x} x X)M R is compact. Then q_'q(x) is compact, hence closed, being the 
continuous image of a compact set. 

In (b), we have p2((US x X) NR) = {y € X | (x, y) € R forsome x € U*} = 
{fy € X | q7!qQ) NUS FB} = {y € X | q7!q(y) CU} = V*. Since U is open, 
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the left side is closed, by the same considerations as in (a). Thus V° is closed, and V 
is open. 

In (c), let g(x) and q(y) be distinct points of X/~. By (a), the disjoint subsets 
q—'q(x) and q~!q(y) are closed. Since X is normal, find disjoint open sets Uy and U2 
containing g~!q(x) and q~!q(y), respectively. Let V; = {z € X | q~!q(z) C Uj} 
and V2 = {z € X | q~!q(z) © Up}. These are disjoint sets, and they are open by 
(b). Then g(V1) is open in X/ ~ because q-'q™M) = V, is open, and similarly 
q(V2) is open. The sets g(V1) and q(V2) are disjoint because q-'q() = V; and 
q-'q(V2) = V2 are disjoint. Thus q(V;) and q(V2) are the required open sets 
separating q(x) and q(y). 

For (d), part (c) shows that X/~ is Hausdorff, and therefore its compact subsets 
are closed. The image of any closed set is X is the image of a compact set, hence 
is compact and must be closed. For (e), the answer is “no,” and part (f) supplies a 
counterexample. For (f), the function p : X — S! is continuous, and Proposition 
10.38a produces a continuous function po : X/~ — S! such that p = po oq, where 
q is the quotient map. Then po is continuous and one-one from a compact space onto 
a Hausdorff space and must be a homeomorphism. 


12-13. The proofs are the same as in Section II.8. 
14. This is proved in the same way as in Problems 13 and 11 in Chapter II. 


15. For (a), call the relation ~. This is certainly reflexive and symmetric. For 
transitivity let x ~ y and y ~ z. Then x and y lie in a connected set E,, and y and 
z lie in a connected set F. The sets E and F have y in common, and Problem 13a 
shows that E U F is connected. Thus x ~ z. Part (b) is immediate from Problem 
13b. For (c), let x be given, and let U be a connected neighborhood of x. Then U 
lies in the component of x. Thus the component of x is a neighborhood of each of its 
points and is therefore open. 


16. Form the class C of all functions F' as described, including the empty function, 
and order the class by inclusion; for the purposes of the ordering, each function is 
to be regarded as a set of ordered pairs. The class C is nonempty since the empty 
function is in it. If we have a chain in C, we form the union F of the functions in the 
chain. We show that F is an upper bound for the chain. To do so, we need to see 
that the indicated sets cover X. Thus let x € X be given. Only finitely many sets U 
in / contain x, by assumption. Say these are U;,..., U,. If one of these fails to be 
in the domain of F, then x lies in Wales U, Védomain(F) V,and x is covered. Thus all 
of U;,..., U, may be assumed to be in the domain of F’. Each U; is in the domain 
of some function Fj in the chain, and all of them are in the domain of the largest of 
the Fj’s, say Fo. Since x is not in Uveu, Védomain(F) V > it is not in the larger union 
Uveu, vedomain(f) V- Thus it must be in Uy cdomainz (U). Since Fo(U)  U for 
each U, x must lie in some Fo(U;). Then x lies in F(U;), and F is an upper bound 
for the chain. 

By Zorn’s Lemma let F be a maximal element in C. To complete the argument, 
we show that every set in// lies in domain(F’). Suppose that Up is a set in U/ that is not 
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in domain(F’). Let U’ be the union of all F(U) for U in domain(F) and all V other 
than Up that are not in domain(F). Since F is inC, U'’U Up = X. Hence U"“ isa 
closed subset of the open set Uo. Since X is normal, we can find an open set W such 
that U’ CW CW! CU. If we define F(U) = W, then we succeed in enlarging 
the domain of F’, in contradiction to the maximality of F. Hence every member of U/ 
lies in domain(F’), as asserted. 


17. Form the open sets Vy as in the previous problem. For each U in U, apply 
Urysohn’s Lemma to find a continuous function gy : X — [0, 1] with gy equal to | 
on Vy and equal to 0 on U°. The open cover {Vy} is locally finite since U/ is locally 
finite. Therefore g = )°,,..,gu is a continuous function on X. Since gy is positive 
on Vy and the sets Vy cover X, g is everywhere positive. Therefore the functions 
tu = gu/g have the required properties. 


18. If co = 0, take Fo = 0. If co ¥ O, apply Urysohn’s Lemma to obtain a 
continuous function A with values in [0, 1] that is 1 on Po and is O on No, and then 
put Fo = Sch _ 50. 

19. On P9 NC, go is => co/3 and Fo is co/3. Therefore gg — Fo is > 0 and 
< 2c9/3. Similarly on No NC, go — Fo is < 0 and > —2co/3. Elsewhere on C, 
go and Fo are both between —cg/3 and co/3, and hence |go — Fo| < 2co/3. Thus 
|go — Fo| < 2co/3 everywhere on C. The function F) is continuous from X into R, 
has |F;| < $(4co), and takes a value cy < (4c) on {x € C | gi (x) = c1/3} and 
the value —c; on {x € C | g1(x) < —c1/3}. 

20. Iteration produces continuous functions F,, : X — R with |F,(x)| < 3(3)"co 
for all x in X and | f(x) — 1%9 FiQx)| < (2)"co for all x in C. Let F(x) = 
paar, F,,(x). The series converges uniformly on X by the estimate on F,, (x) and the 
Weierstrass M test, and Proposition 10.30 shows that F is continuous on X. If we let 
n tend to infinity in the estimate on f (x) — yee F(x), we see that F and f agree 
on C. Finally for x in X, 


IF@) < So lFr@)1 < 2 3(3)"co = 00 = sup |fQ)1- 
n=0 n=0 ye 


Thus |F'| and | f| have the same supremum. 


21. Every open interval is in the base and hence is open. The closed interval 
{a < x < b}is the complement of the open set {x < a} U{b < x} and is therefore 
closed. 


22. Leta < b be given. If there exists a c with a < c < b, then the open sets 
{x <c}and {c < x} separate a and b; otherwise the open sets {x < b} and {a < x} 
separate them. Hence X is Hausdorff. 

Let a and a closed set F be given with a not in F’. Since F° is a neighborhood 
of a, there exists a basic open set B containing a that is disjoint from F. If B has 
some element larger than a, let d be such an element; otherwise let d be undefined. 
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If B has some element smaller than a, let c be such an element; otherwise let c be 
undefined. If c and d are both defined, then F C {x < c} U {d < x}, while a is in 
{c < x < d}. Ifc is not defined but d is defined, then F C {x < a}U {d < x}, while 
aisin BN {x < d}. Ifd is not defined but c is defined, we argue symmetrically. If 
neither c nor d is defined, then B = {a} is open and closed; hence B° and B are the 
required open sets separating F and a. 


23. Suppose that any nonempty set with an upper bound has a least upper bound, 
and let FE be a set with a lower bound. We are to produce a greatest lower bound. Let 
F be the set of all lower bounds for E. This is nonempty, and all elements of F are 
< e, where e is an element of FE. So F has an upper bound. Let c be a least upper 
bound. We show that c is a greatest lower bound for E. 

If c is not a lower bound for EF, then EF has some e with e < c,e #c,i.., with 
e<c.All fin F have f < e <c. Soe isasmaller upper bound for F,, contradiction. 
Thus c is a lower bound for EF. If there is some greater lower bound, say d, then 
c <d <e forall e in E. This implies that d is in F, and hence c is not an upper 
bound for F’. 


24. In (a), suppose that Y is nonempty closed and has an upper bound and a lower 
bound. We are to prove that Y is compact. It is enough to handle a set Y = [a, b]. 
Let an open cover U/ of Y be given, and suppose there is no finite subcover. Let E be 
the set of all x in [a, b] such that some finite subcollection from U/ covers [a, x]. Then 
ais in E. Since E is nonempty and has b as an upper bound, the order completeness 
shows that E has a least upper bound c. Since we are assuming that U/ has no finite 
subcover of [a, b], E* M [a, b] is nonempty. This set has a lower bound, namely a, 
and therefore it has a greatest lower bound d. 

Ifeisin FE and f isin E°N[a, b],thene < f. Soe <d,andthenc < d. Suppose 
c <d. Thenc must be in E. Any x withc < x < d cannot be in EF or E*, and 
hence there is no such x. Then a finite subclass of 2/ that covers [a, c], together with 
a member of 7/ that contains d, is a finite open subcover for [a, d] and contradicts the 
fact that d is not in E. Thus c = d. 

Now suppose that c is in E© M [a,b]. Since c = d, E has no largest element. 
Choose a member U of U/ containing c, and find a basic open neighborhood B of c 
contained in U. Then BM E must contain some c’ with c’ < c. A finite subclass of 
U covers [a, c’], and U covers [c’, c]. Thus c is in E, and we have a contradiction. 

We conclude that c is in E. Since c = d, E* \ [a, b] has no smallest element. 
Choose a member U of U/ containing c, and find a basic open neighborhood B of c 
contained in U. Then BN (E° / [a, b]) must contain an element c’ with c < c’, 
and then there must be some c” with c < c” <c’. A finite subclass of U/ that covers 
[a, c], together with the set U, then covers [a,c”] and shows that c” is in E. This 
contradicts the fact that c is an upper bound of EF. 

In (b), let x be given in X. If a < x < b for some a and J, then [a, b] is the 
required compact neighborhood of x. If x is a lower bound for X and there exists b 
with x < b, then [x, b] is the required compact neighborhood. If x is an upper bound 
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for X and there exists a witha < x, then [a, x] is the required compact neighborhood. 
Since X has at least two members, there are no other possibilities. So X is locally 
compact. 


25. In (a), the sets {x < b} and {a < x} are open and disjoint, contain a and b 
respectively, and have union X. Thus X is disconnected. 

In (b), suppose that X is order complete and has no gaps. Assume, on the contrary, 
that U and V are disjoint nonempty open sets with union X. Say that u < v for some 
uin U and vin V. It will be convenient to assume that u is not the smallest element in 
X and v is not the largest; when this assumption is not in place, the same line of proof 
works except that one may below have to use basic open sets of the form {r < x} and 
{x < s},as wellas {r <x <s}. 

Form the set S of all x € X with x < v and (x, v] C V. This set has u asa 
lower bound, and we let b be the greatest lower bound. Then u < b < v. First 
suppose that b is in V. Choose a basic open set (7,5) C V withr < b < 5; this is 
possible by our temporary assumption because V is open. Then (max{u,r}, v] C V. 
If max{u,r} < b, then max{u,r} is in S and b is not a lower bound for S; thus 
b < max{u,r},ie., b = u. This is impossible since b is assumed to be in V. We 
conclude that b is in U. Choose a basic open set (r,s) C U withr < b < s; again 
this is possible by our temporary assumption because U is open. Since there are no 
gaps, we can find s’ with b < s’ < s. Then min{v, s’} is a lower bound for S, and 
b cannot be the greatest lower bound unless min{v, s’} < b,ie., b = v. This is 
impossible since b is assumed to be in U, and we have arrived at a contradiction. 


26. As an ordered set, X is the same as R, and hence its order topology is the same 
as for R, which is connected. In its relative topology, X is disconnected, being the 
disjoint union of the open sets [0, 1) and [2, 3). 

27. The subset [0, 1) is closed, being the intersection of all sets {x | x < y} 
for y € (1,2]. Similarly (1, 2] is closed. Hence they are both open, and X is 
disconnected. It follows immediately from the definition that there are no gaps. 


28. If anonempty subset of points (x, y) is given, let x9 be the least upper bound 
of the x’s. If no (xo, y) is in the set, then (xo, 0) is the least upper bound for the set. 
If some (xo, y) is in the set, let yo be the least upper bound of the y’s. Then (xo, yo) 
is the least upper bound of the set. We conclude that X is order complete. Problem 
24a then shows that X is compact. This proves the compactness in (a). There are no 
gaps, and Problem 25b thus proves the connectedness. For each x ¢€ [0, 1], the set 
{(x, y) | 0 < y < 1} is open. Thus we have an uncountable disjoint union of open 
sets, and X cannot be separable. Part (b) is handled in the same way. 


Chapter XI 


1. In (a), every compact subset of X is compact when viewed as in X*, and this 
gives inclusion in one direction. In the reverse direction it is enough to show that 
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when U is open in X*, then U — {oo} is a Borel set in X. Since X is o-compact, 
we can choose an increasing sequence of compact sets K, with K, C K?,, and 
ese K, = X. Then UN Ko is open and bounded, hence is a Borel subset of X. 
The countable union of these sets is U, and hence U is a Borel set. In (b), the Borel 
sets of X are the countable sets and their complements. However, every subset U of 
X is open in X and therefore open in X*. Its complement in X* is compact and is a 


Borel set in X*. Thus U is a Borel set in X*. 


2. Part (a) of the previous problem shows that every open subset of X is a Borel 
set, and hence every continuous function is a Borel function. 


3. Use the regularity to show that the conclusion holds for indicator functions and 
hence simple functions. Then pass to the limit. 


4. Let Ig be an indicator function. Given € > 0, find by regularity a compact set 
L and an open set U with L C E CU and w(U — L) < €. The compact set K will 
be K = (U — L)° = LM U*. Thus consider the restriction of J; to the compact set 
K. Let x be in K. If x is in FE, then x is in L. The setU 1 K = L is a relatively 
open neighborhood of x, and J/g is identically 1 on this. Hence the restriction of Iz 
to K is continuous at the points of EF. Similarly if x is in E°, then x is in U‘. The set 
L°O K = U% isa telatively open neighborhood of x, and we argue similarly. This 
handles indicator functions, and the result for simple functions follows immediately. 

Next suppose that f is a real-valued Borel function > 0. Choose an increasing 
sequence of simple functions s, > O with limit f. Let « > 0 be given, and find, 
by Egoroff’s Theorem, a Borel set EF with w(E°) < € such that lims,(x) = f(x) 
uniformly for x in E. Next find, for each n,a compact subset K, of X with u(K;) < 
€/2” such that s, K, is continuous. The set F = EN ‘(os Kn) has complement of 
measure < 2€, and the restriction of every s, to F is continuous. Since {s,,} converges 
to f uniformly on E, the restriction of f to F is continuous. Using regularity once 
more, we can find a compact subset Ko of F such that u(F — Ko) < €. Then 
(K5) < 3e, and the restriction of f to Ko is continuous. 


5. In (a), any rotation preserves Euclidean distances and fixes the origin. Since 
Sap is exactly the set of points whose distance d from the origin has a < d < 
b, Sap is mapped to itself. Part (b) follows from the change-of-variables formula 
(Theorem 6.32). The determinant that enters the formula is the determinant of the 
matrix of the rotation and is 1. The first conclusion of (c) is what the change-of- 
variables formula gives for the transformation to spherical coordinates when applied 
to the set S,, if we take Fubini’s Theorem into account. It yields J. Sep LF dx = 


([? 2 dr) (fo Lf dw) = (fer an (fe Lf dw). Since [oredr is not zero, we 
can divide by it and obtain the second conclusion of (c). Part (d) is proved by setting 
it up to be a special case of the uniqueness in Theorem 11.1. 


6. In (a), monotonicity of uw gives w(K) < infy w(Kq). Suppose that < holds. 
Choose by regularity an open set U containing K such that u(U) < infy u(Kq). The 
sets K£ form an open cover of the compact set U“, and there is a finite subcover. The 


630 Hints for Solutions of Problems 


intersection of the complements is one of the sets Kq,, and it has the property that 
Ka. © U. Monotonicity then gives w(K.) < w(U), and thus infy w(Ky) < w(U), 
contradiction. 

For (b), consider all compact subsets K of X for which w(K) = 1. The intersection 
of any two of these is again one by Lemma 11.9. If Ko is the intersection of all of them, 
then Ko is compact, and (a) shows that (Ko) = 1. If Ko contains two distinct points x 
and y, find disjoint open neighborhoods U, and U,. Then Ko = (Kg—U,)U(Ko—Uy) 
exhibits Ko as the union of two proper compact sets. At least one of them must have 
measure 1, and then Ko is shown not to be the intersection of all compact subsets of 
measure 1. 

In (c) let Kg be any compact Gs, and choose a decreasing sequence { f,} in C(X) 
with limit /x,. Passing to the limit from the formula [, f? du = (fy fndu) , we 
obtain u(Ko) = u(Ko)?. Thus (Ko) is 0 or 1. By regularity, w takes only the 
values 0 and 1, and (b) shows that wu is a point mass. 

For (d), apply Theorem 11.1 and obtain the regular Borel measure jz corresponding 
to £. Then yz has the property in (c) and must be a point mass. 


7. The statement for (a) is that u(r, @) is the Poisson integral of a signed or complex 
Borel measure on the circle if and only if supg _,. -; ||u(r, 4) ||, 4 18 finite. The necessity 
is proved in the same way as in Problem 7 at the end of Chapter IX. The sufficiency 
is proved in the same way as in Problem 8 in that group, except that the weak-star 
convergence is in M (circle) relative to C (circle). For (b), expand u(r, 0) in series as 
in Problem 13 at the end of Chapter IV. Since u is nonnegative, the L! norm over any 
circle centered at the origin is just the integral, and the result of integrating in 6 is 
that the n = 0 term is picked out. Thus ||u(r, @)]|| 16 = 0 for every r. The condition 
in (a) is satisfied, and u is therefore the Poisson integral of a Borel complex measure. 
Examination of the proof of (a) shows that the complex measure is a measure. 


8. Order topologies are always Hausdorff. Since (2* has a smallest element and a 
largest element, Problems 23 and 24 of Chapter X show that Q* is compact if every 
nonempty subset has a least upper bound. Since the ordering for Q* has the property 
that every nonempty subset has a least element, the existence of least upper bounds 
is satisfied. 


9. First we prove that the intersection of any two uncountable relatively closed 
sets C and D is uncountable. Assume the contrary. Since C M D is countable and 
the countable union of countable sets is countable, there is some countable ordinal w 
greater than all members of C M D. Since C and D are uncountable, we can find a 
sequence w < a < By, < a2 < Bo <--- such that each a; is in C and each ay; is in 
D. The least ordinal y greater or equal to all members of the sequence is a countable 
ordinal and has to be a limit point of both C and D. Since C and D are closed, y 
is in CM D. But CM D was supposed to have no ordinals greater than w. This 
contradiction shows that C MN D is uncountable, and of course it is relatively closed 
also. 

Now let a sequence of uncountable relatively closed sets C,, be given. By the 
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previous step we may assume that they are decreasing with n. If (\°2,Cn = C is 
countable, then there is some countable ordinal w greater than all members of C. 
Replacing C,, by C, M {x => w} we may assume that the C,, have empty intersection. 
Let a, be the least member of C,,. The result is a monotone increasing sequence since 
the C,, are decreasing. If a is the least ordinal > all w,, then a is a countable ordinal. 
It is a limit point of each C,,, hence lies in each C,,. The existence of a contradicts 
the fact that the C,, have been adjusted to have empty intersection. This contradiction 
shows that (|, Cn is uncountable. 


10. For additivity the question is whether the union of two sets that fail to meet 
the condition of the previous problem can meet the condition. The answer is no 
because the previous problem shows that the intersection of any two sets meeting the 
condition again meets the condition. The complete additivity is then a consequence 
of Corollary 5.3 and the result of the previous problem. The measure jz takes on only 
the values 0 and | and yet is not a point mass because one-point sets do not satisfy 
the defining property for measure 1. Problem 6b therefore allows us to conclude that 
wv is not regular. 

11. Let u be a Borel measure on X, and let S be the set of all regular Borel measures 
v with v < y. This contains 0 and hence is nonempty. Order S by saying that vy < v2 
if v)(E) < v2(E) for all E. If we are given a chain {vy}, let C = sup, ve (X). This 
is < y4(X) and hence is finite. Choose a sequence {vq,} from the chain with vg, (X) 
monotone increasing with limit C. Then v,,(£) is monotone increasing for every 
Borel set FE, and we define v(£) to be its limit. The complete additivity of v follows 
from Corollary 1.14, and it is easy to check that vy, < v < yw for all a. We have to 
check that v is regular. Let e > O be given, and choose vg, with vy, (X) > v(X)—e. If 
EF is given, find K and U with K C E C U,K compact, U open, and vg, (U —K) < €. 
Then 


Vo,(U — K) + v(U — K)°) + € = va, (U — K) + ve, (U — K)) + € 
= Vg, (X) +e => v(X) = v(U — K) + v(U — K)*), 


and hence vy,(U — K) +e > v(U — K). Since vg,(U — K) < €, we obtain 
v(U — K) < 2e. Thus v is regular. The decomposition readily follows. 


12. This follows immediately from Proposition 11.20. 


13. Let w = bw, + Mp = Vy + Vp with yz, and v, regular and with wz, and v, purely 
irregular. Write o = fl, — Vy = Vp — Mp in terms of its Jordan decomposition as 
o =ot—o. Theno* < pw, ando < »v,, and hence ot and o~ are regular 
by Proposition 11.20. Also, o* < v, and o~ < wp, and the definition of “purely 


irregular” forces ot and o~ to be 0. Then yz, = v, and Lp = vp. 

14. Let yz be as in Problem 10, and suppose that v is a regular Borel measure with 
v < wm. Since v({oo}) = 0, Problem 6a shows that limgyo9 v({x = a}) = 0. For each 
n, let a, be the least ordinal such that v({x > a@,}) < 1/n. The least ordinal > all 
Q@, is a countable ordinal 6, and v({x > B}) = 0. Since {x < f} is a countable set, 
U(x < B}) =0. Therefore v({x < 6}) = 0, and we conclude that v = 0. 
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16. For the regularity any set in F is in some F,,. The sets in F,, are of the form 
E=Ex Ce eon) with E C Q” and v(E) = v,(E). Given € > 0, choose 
K compact and U open in Q” with K C E C U and v,(U — K) < €. nQ,K is 


compact, U is open, K C EF CU,andv(U — K) <e. 


17. Let E = UJ, En disjointly in F. Since v is nonnegative additive, we 
have yeaa v(E,,) < v(E). For the reverse inequality let € > 0 be given. Choose K 
compact and U, openwith K C E,E, C U,,v(U,—E,) < €/2”,andv(E—K) <e. 
Then K C J“, U,, and the compactness of K forces K C Uy U,, for some N. 
Then v(Z) < v(K)+e < da hae Un) +e < So) v(U,) +e < x3 V(En)+2€ < 
yn (En) + 2€. Since € is arbitrary, v(E) < 0°, v(En). 


18. The key is that Q is a separable metric space. Every open set is therefore the 
countable union of basic open sets, which are in the various F,,’s. 


Chapter XII 


1. In (a), the closed ball is closed and contains the open ball; also every point 
of the closed ball is a limit point of the open ball since ||x; — xo|| = r implies that 
IL-3) @1—x0)+x0]—xoll = (1-4) llx1—xoll <r andlim,[(1—4) («1 —x0)-+x0] = 
X{- 

For (b), let the closed balls be B(r,; Xn. If m > n, then ||Xm — Xn|| < rn since 
Brimi Xm)! © Brn Xn). Letr = lim, r,. If r = 0, then {x,} is Cauchy and hence 
is convergent. In this case if x = lim x,, then ||x — x,|| <7, for all n, and hence x is 
in B(ry; Xn) for alln. Ifr > 0, fix no large enough so that r,, < 3r/2. It is enough 
to show that x, is in B(ry; x5)" for n > no. We may assume that x,, 4 x,. The 
members of B(r,; X,) are the vectors of the form x, + v with ||v|| < 7,, and these are 
assumed to lie in B(rn 3 Xn)). Therefore |X, — Xn) + v|| < rny for all such v. Take 
v= Pe tan = Xn): Then Tio 2 Xn — Xng + vi = |]d + tg tnd On _ Xno) Ih om 
1%), tilt = apoll, Hote rte Gry 4 = 4. So |e — eel = OF 3) = 


3 33 : 
Bl ng S 35K <7 <7Tn, as required. 


2. Reduce to the real-valued case, and there use Theorem 1.23 and the remarks at 
the end of Section A3 of the appendix. 


3. Convergence in either case is uniform convergence. For H*(D), suppose 
therefore that { pas oe” zi} is a Cauchy sequence in H™(D) indexed by n. Write 
z = re’, multiply by e~'"®, and integrate in 9 from —z to 2. The result is that 
{c™r™} is Cauchy in n for each r < 1 and each m. Then lim, cr” = cpr™ 
exists for each r and m. Taking r = 1/2, we see that lim, c® = Cm exists for 
each m. Arguing as in the proof of Theorem 1.37, we see that f(z) = Yo¢20 ceze 
is convergent for |z| < 1 and that the sequence of functions f,(z) = >°220 Ou 
converges to it pointwise. Since { f,} is uniformly Cauchy and pointwise convergent 
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to f,it converges uniformly to f. For the vector subspace A(D), we have A(D) = 
H®(D)C(D"). Hence A(D) is a closed subspace of H®(D). 


4. In (a), let us check the triangle inequality. For y € Y, we have |la+ b+ y|| < 
la + y’|| + |b + ( — y’)|| for all y’ € Y. Comparing the definition of ||a + b + Y || 
with the left side, we obtain ||a+b+Y|| < |la+y’|| + |b + (& — y’)|l for all y and 
y’ in Y. Thus |la+b+ Y|| < jlat+y’l|+ ||b+ y”|| forall y’ and y” in Y. Taking the 
infimum over y’ and y” gives the desired conclusion. 

In (b), let a Cauchy sequence in X/Y be given. It is enough to prove that some 
subsequence in convergent. Thus it is enough to prove that if {x,,} is a sequence in 
X with ||x, — Xn41 + Y|| < 27~”, then {x, + Y} is convergent in X/Y. We define a 
sequence {x,,} in X with X, = x, — y, and y, in Y such that ||x, —X,41|| < 2-27”. It 
is then easy to check that {X,,} is Cauchy in X and that if x’ is its limit, then {x, + Y} 
tends to x’ + Y. To define the y,’s, we proceed inductively, starting with y; = 0. 
If y,..., Yn have been defined such that |X, — X441|| < 2-27* fork <n, choose 
Yn+1 in Y such that I|Xn = Xn41+ Yotill S lla — Xn41 + YI) +27? < 2-27". Then 
Xnt1 = Xn41 — Yast has ||X_ — Xp41|| < 2-27”, and the induction is complete. 


5. In (a), we have c"G(vq,..., u,)E = pay, (0; ve = ij (c,.6/0;) = 
(Ce; CiV;, er cjv;) — | Yo; civ; |’. In (b), G(vj,..., U,) is Hermitian, and thus 
the finite-dimensional Spectral Theorem says that there exists a unitary matrix 
u = [u;;] with u!G(v1, ..., U,)u diagonal, say = diag(dj,...,d,). Then dj; = 
eu Guy, ...+, U,)ue;, and this, by (a), equals | Yo; ci; |’ with c = ue;. Hence 
d; = 0. In (c), we have det G(v}, ..., Un) = det(u-!G(v4, ..e5 Un )U) = djd2--- dy 
> 0 with equality if and only if some d; is 0. If dj = O, then ; ciu;, = 0 
for c = ue;, and hence vj,..., v, is dependent. Conversely if vj,..., v, is de- 
pendent, then ; cjv; = O for some nonzero tuple (cj,...,¢,), and therefore 
0= er CiUjs vj) =>, ci (uj, v;) forall j; this equality shows that a nontrivial linear 
combination of the rows of G(v1,..., v,) is O, and hence det G(v1,..., vn) = 0. 


6. A single induction immediately shows the following: span{vj,...,1%} = 
span{uj,..., ux}, , is #0, and vx is defined. Then each vz has norm 1. If k < /, 
then (vj, v,) = (ut) _ Sa (uj, Uj )U;, Uk) = (uy, Ue) — (Uy, Ug) = 0. This proves the 
orthogonality. 


7. Define F on each uy to be the vector vg given in the statement of the problem, 
and extend F linearly to a mapping defined on the linear span V of {uv}. Corollary 
12.8c shows that ||F'(u)||_7, = |lu\|,,, for u in V. Corollary 12.8b shows that V is 
dense. Proposition 2.47 shows that F extends to a bounded linear operator from A; 
into H satisfying ||F(u)||y, = |lull,, for uw in H;. Arguing in the same way with 
F—! proves that F is onto H. The second conclusion follows by using Proposition 
12.11. 


8. In (a), the boundedness is elementary, and the operator norm is || f||,,. In (b), 
the adjoint is multiplication by the complex conjugate of f. 
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9. The linear span V of {x,} is a separable vector subspace. Suppose that it is not 
dense. Choose by Corollary 12.15 a member x* 4 0 of X* with x*(V) = 0. Since 
{x7} is dense, choose a subsequence {x;} with x7, > x*. Then 


1 
Ilx* — xp ll = 1Oc* = x5, ) ng) | = eh, Oe)! zl, Il 


Since the left side tends to 0, so does the right side. Thus a tends to 0, and x* =0, 
contradiction. 


10. The dual of C(X) is M(X). Define a linear functional x* on M(X) by 
x*(p) = p({so}). Then ||x*|| = 1, so that x* is in M(S)*. Let 6, denote a point mass 
at s. If x* were given by integration with a continuous function f, then we would 
have [5)}(s) = 5s({so}) = x* (65) = Ss fds = f(s). Thus the only possibility 
would be f = /j5)}, and this is discontinuous. 

11. Let X and Y be normed linear with X complete, and let {LZ,,} be a family of 
bounded linear operators L, : X — Y such that ||Z,(x)|| < Cx for each x in X. 
For each y* in Y* with || y*|| < 1, the linear functional y* o L, on X is bounded and 
has | y*(Z,(x))| < Cy. Since X is complete, the Uniform Boundedness Theorem for 
linear functionals shows that |y*(Z,(x))| < C||x|| for all x. Taking the supremum 
over y* and applying Corollary 12.17, we obtain ||L,(x)|| < C||x||, as required. 


12. For x in X and y in Y, we have 


< 2C||x — yll + Lay) — Ln) Il. 


Given x € X and € > 0, choose y in Y to make the first term < €, and then 
choose n and m large enough to make the second term < e. It follows that {L,,(x)} 
is Cauchy for each x. Since X’ is complete, L(x) = lim, L,(x) exists for all 
x. Continuity of addition and scalar multiplication implies that L is linear. Then 
|Z (x) || = lim ||LZp(%)|] < lim inf, || Ln || l2°|] < Cll]. Hence ||L|] < C. 

13. Proposition 12.1 shows that X* is a Banach space. We identify the ele- 
ments x, in X with their images :(x,) under the canonical mapi : X > X**. 
Corollary 12.18 shows that the element 1(x,) of X** has ||¢(xq)||_ = |lxal|. The 
hypothesis shows for each x* that |(¢(%w))(x*)| = |x*(tq)| < Cy* for a constant 
C,» independent of a. Since X* is complete, the Uniform Boundedness Theorem 
(Theorem 12.22) shows that ||¢(x~)|| < C foraconstant C independent ofa. Applying 
Corollary 12.18 a second time, we conclude that ||x,|| < C independently of a. 


14. For (a), let u and v have ||u — x|| < r and ||v — x|| < r. Then the estimate 
| —futty—x]| = |d-n@—x)+tv—x)|| < ]0-)@—x) +e —x)|l = 
(1 —1f)|lu —x|| +¢\lv —x|| < d —t)r +¢tr =r proves the convexity. 

For (b), let X be the space of sequences s = {s,} with ||s|| = >, |snl. Let Ex be 
the set of sequences with all s, > 0, with ||s|| = 1, and with s; = O for j < k. Ifs 
and ¢ are two sequences with terms > 0, then ||s + ¢|| = ||s|| + ||t||. The convexity 
follows, and everything else is easy. 
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15. Denote open balls in X by By and open balls in Y by By. The Interior Mapping 
Theorem says that L(By(1;0)) is open. Hence it contains a ball By(e;0). Put 
C=e!, By linearity, L(Bx (Cr; 0)) > By(r; 0) foreveryr > 0. Since L is onto Y, 
we can choose x9 in X with L(xo) = yo. Linearity gives L(Bx (Cr; x0)) D By(r; yo). 
For each y,,, we can take r = 2||y,, — yo|| and choose x, in By(C2||y_ — yoll; xo) with 
L(X%y) = Yn. Since yy > Yo, X, > Xo. Also, we have ||x, — xo|| < 2C |lyn — yoll. 

In this construction if yo = 0, we could choose x9 = 0, and then the result follows 
with M = 2C. 

If yo # 0, then llynl| > llyoll # 0 says that ||yn[] < 4llyoll only finitely often. 
For these exceptional n’s, we can adjust x, when y, = 0 so that x, = 0, and then we 
have ||x,|| < M||y,|| for a suitable M and the exceptional n’s. For the remaining n’s, 
an inequality ||x,|| < M||y,|| is valid as soon as {x,} is bounded, and {x,} has to be 
bounded since it is convergent. 


16. It will be proved that the distance from e to Xo is > 1. The set Xoo of all 
sequences $1, 52 — $2, 53 — S2,... such that {s,} is in X is closed under addition and 
scalar multiplication. Hence it is a dense vector subspace of Xo, and it is enough to 
prove that |le — s|| => 1 for all s in X99. Let s be in Xoo, and let c = e — s. Adding 
the first n entries gives c} +---+c¢, =n —5S,. Hence |c} +---+c¢,| > n—|ls|]. If, 
by way of contradiction, ||c|| = 1 — € with e > 0, then |c;| < 1 —« for all j, and we 
have |c] +---+c¢,| <n —ne. Thus n — ||s|| < n — ne, and we get ne < ||s||, in 
contradiction to the finiteness of ||s||. 


17. This is immediate from Corollary 12.15 and the previous problem. 


18. For (a), let s > 0 have ||s|| = 1. Then |je — s|| < 1, and so |x*(e —s)| < 1. 
Since x*(e) = 1, this says that |1 — x*(s)| < 1. On the other hand, |x*(s)| < 1 since 
|s || < 1. Thus 0 < x*(s) < 1. We can scale this inequality to handle general s. 

For (b), the two sequences differ by a member of Xq, on which the Banach limit 
vanishes identically; then (c) follows by iterated application of (b) since the Banach 
limit of the 0 sequence is 0. 

In (d), let € > O be given. By applying (c), we see that we may adjust the 
sequence so that sup, s, — inf, s, < € and so that the Banach limit is unchanged. 
By (a), Banach limits preserve order. Since (infs,)e < s < (sups,)e, we have 
infs, < LIM, 5, < sups,. Since sups, = (sups, — limsups,) + limsups, < 
(sup, —inf,) + limsup, < limsup, +e, we obtain LIM,-,.s, < limsup, +e. 
Since ¢€ is arbitrary, LIM, 5, < limsup,. Similarly liminfs, < LIMy—00 Sn. 
Conclusion (e) is immediate from (d). 


20. The parallelogram law gives 
2 20 2 2 
2(lx ty + 2il% + My — ZI") = Ile + 2yllP + [le + 2z11°. 
If we set z = O in this identity and then set y = 0 in it, we get two relations, 


one involving an expression for ||x + 2y||? and the other involving an expression 
for ||x + 2z||7. If we substitute these relations into the displayed equation for the 
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terms ||x + 2y||? and ||x + 2z||?, we obtain the formula ||x + y + z||? + lly — z||? = 
Ix + yl? + lle +211? = lel? + lly? + zll?. Substitution of 2||y |]? +2|Iz II? = ly +l? 
for ||, — z||* in this formula gives the desired identity. 


21. We have 


y 
(x1 +.x2,y) = Do Flair +2 + Fy |? 


ik 2 2 2 2 
Zier + x2 ll — Waal’ = [xl = lly) 


Ss 
k 
» 
k 
+ Flay tityle + YO, Fle + iy. 
k 


Each term of the first line on the right is 0 because >, i k/4 = 0, and thus the right 
side simplifies to (x1, y) + (x2, y), as required. 


22. Induction with the result of the previous problem gives (nx, y) = n(x, y) 
for every integer n > 0. Replacing nx by z, we obtain iz, y)= (t Z, y). Hence 
(rx, y) = r(x, y) for every rational r > 0. It follows from the definition of (-, - ) 
that (—x, y) = —(x, y) and that if the scalars are complex, (ix, y) = i(x, y). 
Consequently (rx, y) =r(x, y) ifr is in the set D. 


23. We are to prove that |(x, y)| < ||x|I||y|], and we may assume that y 4 0. Ifr 
is in D, we have 


0 < Ix —ry|? =(@ -ry,x —ry) = |x? —rQ, x) — F(x, y) + IrPIly ll’. 
Letting r tend to (x, y) / \|y||? through members of D, we obtain 
0 < |x? — 210, WP / My? +1@, WP iy? ily it = bei? - 1, 7 / I’, 


and it follows that |(x, y)| < ||x|||lyll. 
24. The Schwarz inequality gives 


Ir (x, y) — (cx, y)]| = [rx — ex, y)| < I@ —)xIlily ll = Ir — elllaliiiy 


As r tends to c through D, the right side tends to 0, and the left side tends to 
Ic(x, y) — (cx, y)|. Hence c(x, y) = (cx, y). 
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INDEX OF NOTATION 


See also the list of Standard Notation on page xxi. In the list below, items 
are alphabetized according to their key symbols. For letters the order is italic 
lower case, Roman lower case, italic upper case, Roman upper case, script, and 
blackboard bold. Next come items whose key symbol is Greek, and then come 
items whose key symbol is a nonletter. The last of these are grouped by type. 


an, 61 

arccos, 79 

arcsin, 51 

arctan, 51 

A(D), 523 

b,, 61 

Bir; x), 83 

B(S), 87, 281, 521 
B(S,C), 87 
B(S,R), 87 

B,, 297 

By, 297 

B(X), 488 

B(X, Y), 284, 524 
Bo(X), 504 
By(K), 318 
By(V), 315 

c, 522 

co, 522 

Cn, 61, 335 

card, 577 

A, 93 

cos, 47 

Cc™, 142 
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C(S), 100, 281, 464, 521 
C(S,C), 100, 512 
C(S,R), 100, 512 


Co(X), 509, 521 
C°(E), 142 
C*(E), 142 
Ccom(X), 300, 314, 486, 521 
CN (fa, b]), 522 
d, 41 

dy, 75 

d(x, y), 83 

dx, 298 

dw, 326, 491, 517 
D, 384 

Dj, 385 

Dw, 69 

e, 47 

e*, 148 

ej, 566 

exp, 47, 148 

E,, 267 

E’, 267 

E[f |B], 440 

F,, 492 

F, 374 

F, 135 

F", 565 

Gs, 492 

H, 397 

H,, 401 

H®(D), 523 
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inf, 3, 6 

Te, 171,241 
Tr(s), 242 
Jp(t), 225 
Ky, 71 
K(X), 490 
£2, 87 

€f 0? 522 
lim, 5, 98 
lim inf, 7 
lim sup, 7 
log, 49 

L', 89, 281 
L?, 89, 281 
EO 281 
L,, 544 
L(y), 208 
L(P, f), 27, 162 


L?(X), L?(X, A, w), L?(X, uw), 


281, 522 
m, 236, 298 
mr(f), 162 
M(X), 522 
M(X,C), 514 
M(X,R), 512 
Mr(f), 162 
A°, 92 
OSCy(Xp), 119, 167 
p’, 411 
X=PUN, 419 
P(A), 208 
P(x,t), P(x), 395 
P.(@), 392 
Q(D), 384 
Q(x, €), Oe(x), 398 
QOn(a,...,an), 302 
Q*, 134 
Q,, 133 
Q» 134 
Ria, b], 27 
R(A), 162 
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Rips f dx, 318 
R*, 6, 85 
sn(f;x), 62, 335 
sin, 47 

sup, 2, 5 

S”, 126, 326 
S(P, {ti}, f), 39 
S(P, {tr}, f), 163 
S, 385 

S(R™), 385 

Tr X, 149 

T, 442 

T*, 455 

uj, 566 

U(P, f), 27, 162 
V(f), 355 
V*(f), V-(f), 347 
Zp, 134 


Greek 

T(s), 323 

A, 181, 392 
AXj, 27 
EAF, 233 
f(s), 390 

t, 541 

Ap, 350 

p*, 254, 255 
Ly, 254, 255 
Uf, 350 
UP), 27, 161 
[du], 248 
v= fdp, 252 
On, 53 

0;, 54 

T,, 303 

@-, 312 

Q, 292 

Qy, 326 
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Isolated symbols and signs Specific functions 
€, 554, 555 Cee ie 75 
oo, 5 | - |l,, 75, 279 
G, 23.555 - |, 84, 136, 138, 161, 239, 514 
+oo, —co, 6 x-y, 84 
rt, r_, 347 (-, +), 89, 523 
Dehra: 209 | - ||, 90, 136, 279, 514, 521, 523 
vt, v, 418 : be 133 
~, 62, 335 | - Il, 279 
<«K, 420 | © Ihege 279 
®, 528 lege 28t 
Subscripts and superscripts l= Ilpy, 347 
X*, 128, 444, 455, 524 | + Ip» 410 
f*(x), 328 
M+, 527 Intervals 
L*, 536 Cae 
x*, 537 la, 0], 3 
24, 555 la), 3 

(a, b], 3 
Operations on sets 
Ax B, 266 our 
X/~, 445, 471 [Mj], 566 
. |. 268 (X,T), 442 
BA, 556 an > a, 5, 98 


ah 139 
Uses Ax, ee Ax, XK ves Ax: 557 wale ’ 
ba Aj(X1, +++ Xn aes 199 


Operations on functions (X, A, w), 238 
f +, S58 (D, <), 465 

f *g, 180, 306, 334 {xg}, 465 

wx v, 270 Qt> Xq, 465 
f, 374 


ffx, [?fdx, 27 
(pee J fax, 162 
fo fx, 27, 30 

J, fx, 162 


Se f du, 242 
Je f@), du(x), 242 
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a.e., 248 
Abel summable, 54 
Abel sums, 54 
Abel’s Theorem, 54 
Abelian theorem, 55 
absolute value, 563 
absolutely continuous 
measure, 420 
monotone function, 369 
Stieltjes measure, 369 
abstract rectangle, 266, 297 
additive set function, 234 
bounded, 418 
adjoint, 536 
Alaoglu’s Theorem preliminary form, 284, 509 
algebra, 124 
algebra of sets, 232 
almost every, 248 
almost everywhere, 248 
almost period, 131 
almost periodic, 131 
Bochner, 131 
Bohr, 131 
alternating series test, 19 
approximate identity, 58, 60, 312 
arccosine, 79 


Baire function, 505 
Baire measurable function, 505 
Baire measure, 506 
Baire set, 504 
ball, open, 83 
Banach limit, 550 
Banach space, 521 
quotient, 549 
reflexive, 541 
Banach-Steinhaus Theorem, 543 
base 
countable local, 450 
for metric space, 106 
for topology, 446 
local, 450 
basis 
of vector space, 573 
standard, 566 
Bessel function, 225 
Bessel’s equation, 224 
Bessel’s inequality, 67, 335, 531 
binomial coefficient, xxi 
binomial series, 52, 56 
Bochner almost periodic, 131 
Bochner’s Theorem, 406 
Bohr almost periodic, 131 


archimedean property, 4 

arcsine, 51 

arctangent, 51 

area of sphere, 326 

arithmetic operations, 104 
arithmetic-geometric mean inequality, 182 
Ascoli’s theorem, 22, 121, 229 
Ascoli-Arzela Theorem, 478 
associated Stieltjes measure, 344 
Axiom of Choice, 557 

axiom of countability, 450 


Bolzano—Weierstrass property, 108, 109 
Bolzano—Weierstrass Theorem, 9, 108 
Borel complex measure, regular, 514 
Borel function, 297, 316, 318, 504 
Borel measurable, 239, 297 
Borel measurable function, 504 
Borel measure 299, 316, 318, 490 
purely irregular, 519 
regular, 299, 316, 318, 490 
Borel set, 237, 297, 315, 318, 488, 504 
Borel signed measure, regular, 512 
bound 


Baire Category Theorem, 118, 120, 481 greatest lower, 3 
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least upper, 2 

lower, 3 

upper, 2 
bounded, 478 
bounded additive set function, 418 
bounded, essentially, 280 
bounded function, 87 
bounded interval, 3 
bounded linear operator, 283, 523 
bounded operator, 283 
bounded, pointwise, 22, 121,478 
bounded set, 3, 490 
bounded, uniformly, 22, 121, 478 
bounded variation, 346, 355 


CK function, 142 
C®@ function, 142 
canonical expansion of simple function, 241 
canonical extension of measure, 261 
canonical map, 541 
Cantor 579 
Cantor diagonal process, 24, 461 
Cantor function, 344 
Cantor measure, 354,519 
Cantor set, 118, 290, 343,519 
standard, 118 
cardinal number, 577 
cardinality, 577 
Cartesian product, 556, 557 
Cauchy criterion, 10 
uniform, 18 
Cauchy Integral Theorem, 378 
Cauchy sequence, 9, 112 
uniformly, 18 
Cauchy—Peano Existence Theorem, 229 
Cauchy—Riemann equations, 379, 398 
Cauchy—Schwarz inequality, 563 
Cesaro summable, 53 
Cesaro sums, 53, 371 
chain, 573 
chain rule, 144 
change-of-variables formula, 37, 172, 320 
character, multiplicative, 407 
characteristic function, 171, 241 
characteristic polynomial, 208, 214 
Chebyshev’s inequality, 351 
class, 555 
equivalence, 564 
class Ck, 142 
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class C®, 142 
closed geometric rectangle, 161 
Closed Graph Theorem, 548 
closed interval, 3, 6 
closed map 482 
closed rectangle, 161 
closed set, 3, 92, 442 
closed subspace, 102 
closed unit disk, 126 
closed vector subspace, 104 
closure, 93, 442 
cofactors, 568 
cofinal, 468 
collection, 555 
compact metric space, 108 
compact set, 108, 453 
compact topological space, 453 
compactification, one-point, 456 
complement, 555 
complete metric space, 113 
completely additive set function, 234 
o-finite, 237 
completeness of L?, 285, 414 
completion of measure space, 262, 291 
completion of metric space, 128, 133 
complex conjugation, 563 
complex Euclidean space, 85 
complex measure, regular Borel, 514 
component, 483 
component rectangles, 161 
components of a function, 139 
composition, 558 
conditional expectation, 439 
conjugate Poisson integral, 398 
conjugate Poisson kernel, 398 
conjugation, complex, 563 
connected, 482 
locally, 131 
locally pathwise, 131 
connected component, 483 
connected metric space, 115 
connected set, 115 
constant coefficients, 208, 211 
content 0, 167 
continuity at a point, 95 
continuity, uniform, 111 
continuous derivative on a closed interval, 561 
continuous from the left, 339 
continuous from the right, 339 


continuous function, 10, 96, 443 
uniformly, 11 
continuous linear functional on L?, 425 
continuous periodic, 67 
continuous singular Stieltjes measure, 368 
contraction mapping, 131 
contraction mapping principle, 131 
converge, 463, 466 
convergence 
norm, 284 
uniform, 17 
weak-star, 284, 355, 405, 437 
convergent sequence, 5, 97 
convolution, 58, 180, 306, 334, 417 
of measures, 405 
cosine, 47 
countable, xxi, 106 
countable local base, 450 
countable ordinals, 292 
countably additive set function, 234 
counting measure, 236 
cover, 106, 151 
Cramer’s rule, 568 
critical point, 323 
critical value, 323 
curve, 199 
integral, 199 
cut, 2 


defined implicitly, 153 
definite, 89 
degree of polynomial, 568 
delta mass, 342 
delta measure, 342 
dense, 106 
dense set, 450 
derivative, 140 
at endpoint of interval, 561 
on a closed interval, 561 
partial, 139 
determinant, 567 
Wronskian, 205 
diadic cube, 302 
diagonal process, 24 
diffeomorphism, 160 
difference, 555 
differentiable, 139 
infinitely, 46 
periodic, 67 
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differential equation 
existence theorem for ordinary system, 189, 
229 
first-order ordinary linear, 228 
ordinary, 183 
ordinary homogeneous linear, 184, 201 
ordinary inhomogeneous linear, 201 
ordinary linear, 184, 201 
system of ordinary, 188 
uniqueness theorem for ordinary system, 193 
differential, total, 153 
differentiation, implicit, 153 
differentiation of integrals, 35, 329, 363, 370 
strong, 431,438 
differentiation of monotone functions, 359 
differentiation of series of monotone functions, 
362 
dilation, 303 
Dini’s test, 70, 336 
Dini’s Theorem, 78, 132, 480, 492 
direct image, 559 
directed set, 465 
Dirichlet kernel, 69 
Dirichlet-Jordan Theorem, 348 
discrete metric, 87 
discrete topology, 443 
disk, 126, 181 
distance, 41, 75, 83 
toa set, 96 
distribution function, 339, 351 
divide, 568 
Division Algorithm, 569 
divisor, 568 
greatest common, 570 
domain, 556 
Dominated Convergence Theorem, 253 
dot product, 84, 566 
doubly infinite sequence, 5 
dual group of finite abelian group, 407 
dual index, 411,525 
dual of normed linear space, weak-star 
topology, 444 
dual space, 284, 524 


Egoroff’s Theorem, 291,292,517 
element, 554, 555 

elementary matrix, 174 
elementary set, 233 

entity, 554 
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equicontinuous, 22, 121, 227, 230, 477, 478 
uniformly, 22, 121 
equivalence class, 564 
equivalence relation, 564 
essential bound, 280 
essential supremum, 280 
essentially bounded, 280 
Euclidean algorithm, 570 
Euclidean norm, 84, 85 
Euclidean space, 84, 521 
Euler’s equation, 222 
eventually, 97, 463, 465 
everywhere dense, 106 
existence theorem 
integral curves, 199 
system of ordinary differential equations, 
189, 229 
expansion by cofactors, 568 
expectation, conditional, 439 
exponential, 47 
of a matrix, 148 
extended real number, 6, 85 
Extension Theorem, 238, 253, 271,489, 502 


F492 

factor of polynomial, 568 

Factor Theorem, 569 

family, 555 

fast Fourier transform, 391, 407 

Fatou’s Lemma, 252 

Fatou’s Theorem, 334, 395 

Fejér kernel, 71 

Fejér’s Theorem, 72, 337, 371 

field of scalars, 279 

filter, 294 

finest topology, 445 

finite abelian group, Fourier analysis on, 407 
finite interval, 3 

finite limit for derivative at endpoint, 561 
finite measure space, 238 
finite-intersection property, 109, 454 
first axiom of countability, 450 

first category, 119 

first countable, 450 


first-order linear ordinary differential equation, 


228 
flip, 174 
Fourier coefficient, 62, 335 


Index 


Fourier inversion formula, 380 
Fourier series 
almost everywhere convergence 
of Cesaro sums, 371 
Bessel’s inequality, 67, 335 
convergence in L? ,439 
convolution, 180 
Dini’s test, 70, 336 
Dirichlet-Jordan Theorem, 348 
divergence for a continuous function, 544 
failure of some sequence vanishing at 
infinity to occur, 547 
Fejér’s Theorem, 72, 337 
localization, 71 
Parseval’s Theorem, 74, 338 
Riemann—Lebesgue Lemma, 67, 335 
Riesz—Fischer Theorem, 338 
uniqueness theorem, 73, 77, 338 
with harmonic functions, 227 
with Lebesgue integral, 335 
with Riemann integral, 62 
Fourier transform, 374 
fast, 391, 407 
for finite abelian group, 407 
of measure, 406 
Fourier—Stieltjes coefficient, 349 
Fourier-Stieltjes series, 349 
frequently, 465 
Fubini’s Theorem, 15, 169, 170, 271, 273 
Fubini’s theorem on differentiation of series 
of monotone functions, 362 
function, 556 
functional, linear, 283 
Fundamental Theorem of Algebra, 112,572 
Fundamental Theorem of Calculus, 35, 329, 
363, 370 


G5,492 

gamma function, 323, 353 

gap, 484 

geometric rectangle, 161,266, 297 
closed, 161 

Gram determinant, 549 

Gram matrix, 549 

Gram-Schmidt orthogonalization process, 

530, 549 
graph, 291 
greatest common divisor, 570 


greatest lower bound, 003 
Green’s Theorem, 378 


Hahn decomposition, 419 
Hahn-Banach Theorem, 537 
half space 
Poisson integral formula, 395 
Poisson kernel, 395 
half-open interval, 3, 6, 451 
Hardy-—Littlewood maximal function, 330, 
403, 438 
Hardy-—Littlewood Maximal Theorem, 328, 
333, 430 
harmonic function, 181, 334, 354, 392, 437 
unit disk, 227, 518 
Hausdorff, 105, 447 
Hausdorff—Young Theorem, 428 
hedgehog space, 88 
Heine—Borel Theorem, 108, 110 
Helly’s Selection Principle, 78 
Helly—Bray Theorem, 405, 508 
Herglotz’s Theorem, 518 
Hermitian inner product, 85, 89 
Hermitian symmetric, 89 
Hilbert cube, 88 
Hilbert space, 523 
dimension, 534 
Hilbert transform, 397, 433 
existence almost everywhere, 438 
Hilbert-Schmidt norm, 138 
Holder’s inequality, 411 
homeomorphism, 96, 443 
homogeneous function, 180 
homogeneous linear ordinary differential 
equation, 184, 201, 228 


image, 556, 567 

direct, 559 

inverse, 559 
imaginary part, 563 
implicit differentiation, 153 
Implicit Function Theorem, 155 
indicator function, 171, 241 
indicial equation, 222, 223 
indiscrete space, 87 
infimum, 3 
infinite Taylor series, 44 
infinitely differentiable, 46 
infinity, 5, 6 
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vanish at, 509 
inhomogeneous linear ordinary differential 
equation, 201 
initial condition, 186 
inner measure, 255 
inner product, 89, 523 
inner-product space, 89 
pseudo, 90 
integrable, 242,276 
Lebesgue, 242 
Riemann, 27, 42, 162 
uniformly, 292 
vector-valued function, 274, 277 
integral curve, 199 
integral, Lebesgue, 242 
integral, Riemann, 42 
integration by parts, 36, 67, 344 
interchange of limits, 13 
interior, 92, 442 
Interior Mapping Principle, 545 
Intermediate Value Theorem, 12, 116 
intersection, 555 
interval, 3,6 
inverse function, 558 
Inverse Function Theorem, 156, 562 
inverse image, 559 
irregular singular point, 227 
isometric, 127 
isometry, 127 


Jacobian matrix, 140 
Jessen—Marcinkiewicz—Zygmund, 438 
Jordan and von Neumann Theorem, 526, 551 
Jordan block, 212 

Jordan decomposition, 418 

Jordan form, 212 

Jordan normal form, 213 


kernel 
Dirichlet, 69 
Fejér, 71 
of linear function, 567 


L? completeness, 285, 414 
LP dual, 425 

L? norm, 410 

LP translation, 309, 417 
Lagrange multipliers, 182, 411 
Laplace equation, 181, 392 
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Laplacian, 181, 392 
least upper bound, 2, 574 
Lebesgue constant, 544 
Lebesgue decomposition, 368, 423 
Lebesgue integral, 242 
Lebesgue measurable, 239, 262, 265, 305 
Lebesgue measure, 118, 236, 298, 303 
Lebesgue set, 371 
Lebesgue’s theorem on differentiation of 
monotone functions, 359 
Legendre polynomial, 221 
Legendre’s equation, 220 
Leibniz test, 19 
length, 118 
lexicographic ordering, 484 
limit, 5, 463, 466 
Banach, 550 
of a sequence, 98 
point, 3,93, 442 
uniform, 17 
limits, interchange of, 13 
Lindelof space, 451 
linear function, 566 
linear functional, 283,525 
norm, 424 
positive, 488 
linear map, 566 
linear operator, 283, 523 
bounded, 283, 523 
continuous, 283 
linear ordinary differential equation, 184, 201 
constant coefficients, 208 
homogeneous, 228 
linear transformation, 566 
Lipschitz condition, 189 
local base, 450 
countable, 450 
localization of Fourier series, 71 
localization of Lebesgue integral, 246 
locally compact, 455 
locally connected, 131, 483 
locally finite open cover, 483 
locally pathwise connected, 131,483 
logarithm, 49 
long line, 484 
lower bound, 3 
greatest, 3 
lower Riemann integral, 27, 162 
lower Riemann sum, 27, 162 


Index 


lower semicontinuous, 481 
lower-dimensional set, 325 
Lusin’s Theorem, 517 


map, 556 
linear, 566 
mapping, 556 
Marcinkiewicz Interpolation Theorem, 430 
matrix, 566 
elementary, 174 
Jacobian, 140 
Jordan form, 213 
Wronskian, 205 
maximal element, 573 
maximal function, 330 
Hardy-Littlewood, 330 
Mean Value Theorem, 560 
measurable, Borel, 239, 297 
measurable function, 238 
Baire, 505 
Borel, 504 
measurable, Lebesgue, 239, 262, 265, 305 
measurable set, 238, 241,257, 266 
measurable vector-valued function, 275 
measure, 235 
absolutely continuous, 420 
absolutely continuous Stieltjes, 369 
associated Stieltjes, 344 
Baire, 506 
Borel, 299, 316, 318, 490 
Cantor, 354, 519 
counting, 236 
delta, 342 
inner, 255 
Lebesgue, 236, 298, 303 
outer, 255 
product, 270 
purely irregular Borel, 519 
regular Borel, 299, 316, 318, 490 
regular Borel complex, 514 
regular Borel signed, 512 
signed, 418 
singular, 423 
singular Stieltjes, 368 
Stieltjes, 339 
measure 0, 166, 265 
measure space, 238 
completion, 262, 291 
finite, 238 
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o-finite, 238 
member, 555 
mesh, 27, 161 
metric, 83 

discrete, 87 

hedgehog, 88 

product, 103 

uniform, 87 
metric space, 83 

complete, 113 

completion, 128, 133 

connected, 115 
metric subspace, 102, 104 
metric topology, 442 
metrizable, 476 
Minkowski’s inequality, 412 

for integrals, 287,415 
monotone class, 268 
Monotone Class Lemma, 268, 269 
Monotone Convergence Theorem, 15, 250, 251 
monotone increasing, 339 
monotone sequence, 5 
monotone set function, 494 
multiplication formula, 375 
multiplicative character, 407 
multiplicity, 27 

of a root, 572 
multiplier, 384 
multiplier operator, 384 


negative, xxi 
negative variation, 347 
neighborhood, 442 
of a point, 92 
of a subset, 92 
net, 465 
universal, 468 
Newton’s method, 78 
nonnegative set function, 234 
norm, 90, 429, 521 
convergence, 284 
essential supremum, 280 
Euclidean, 84, 85 
Hilbert—Schmidt, 138 
L?,410 
of linear functional, 424 
operator, 136, 283, 429, 523 
supremum, 28 | 
total-variation, 514 
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uniform, 281 
weak-type, 430 
normal, 105, 447 
normed linear space, 281,521 
finite-dimensional, 521 
pseudo, 280 
weak topology, 444 
weak-star topology on dual, 444 
nowhere dense, 118 
null space, 567 
number 
extended real, 6 
real, 2 


one-one, 558 

one-point compactification, 456 

onto, 558 

open ball, 83 

open cover, 106, 451 
locally finite, 483 

open interval, 3, 6 

open mapping, 471 

open neighborhood, 92, 442 

open rectangle, 161 

open set, 3, 83, 442 

open subcover, 106, 451 

open subspace, 102 

open unit disk, 181 

operator 
bounded linear, 283 
continuous linear, 283 
linear, 283, 523 
multiplier, 384 
norm, 136, 283, 429, 523 
self-adjoint, 536 
sublinear, 429 

order, 183, 188 

order complete, 484 

order topology, 484 

ordered pair, 555 

ordering 
lexicographic, 484 
partial, 573 
simple, 573 
total, 484, 573 

ordinals, countable, 292 

ordinary differential equation, 183 
constant coefficients, 208 
existence theorem, 189, 229 
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first-order linear, 228 polynomial 

homogeneous linear, 184, 201, 228 characteristic, 208, 214 

inhomogeneous linear, 201 in one indeterminate, 568 

linear, 184, 201 Legendre, 221 

uniqueness theorem, 193 prime, 571 
ordinary differential equations system, 188 positive, xxi 

with constant coefficients, 211 positive definite function, 406 
orthogonal, 89, 527 positive linear functional, 488 
orthogonal complement, 528 positive variation, 347 
orthogonal projection, 530 potential theory, 353 
orthonormal, 529 power series, 44, 46 
orthonormal basis, 532 prime polynomial, 571 
orthonormal set, maximal, 532 primitive mapping, 175 
oscillation, 119, 167, 481 probability, 171,241, 339, 439, 486 
outer measure, 255 product 

Cartesian, 556,557 

p-adic numbers, 133 dot, 84, 566 
pair inner, 89, 523 

ordered, 555 measure, 270 

unordered, 555 metric, 103 
parallelogram law, 526 topology, 443, 458 
parameters, 198 product of metric spaces, 103 
Parseval’s equality, 532 product of sets, 556 
Parseval’s Theorem, 74, 338 product of o-algebras, 266 
partial derivative, 139 projection, orthogonal, 530 
partial ordering, 573 Projection Theorem, 528 
partition, 26, 161 pseudo inner-product space, 90 
partition of unity, 151, 483 pseudo normed linear space, 280 
path, 116, 199, 482 pseudometric, 83 
pathwise connected, 116, 482 pseudometric space, 83 

locally, 131 pseudonorm, 90, 279 
periodic, 63 purely finitely additive, 438 
Picard iteration, 187 purely irregular Borel measure, 519 
Picard—Lindelof Existence Theorem, 189 Pythagorean Theorem, 526 
Plancherel formula, 381 
point, 555 quotient map, 445, 471 
point mass, 342 quotient space of a Banach space, 549 
pointwise bounded, 22, 121,478 quotient topology, 443, 445, 471 
Poisson integral, 355 

conjugate, 398 radius of convergence, 45, 78 

formula for half space, 395 Radon—Nikodym Theorem, 421, 425, 439, 507, 

formula for unit disk, 354, 392, 437 515 
Poisson kernel range, 556 

conjugate, 398 rank, 567 

for half space, 395 real number, 2 

for unit disk, 392 real part, 563 
Poisson Summation Formula, 389 rearrangement, 12 
polar coordinates, 173, 322 rectangle, 161,267, 266, 297 


polarization, 527 abstract, 266, 297 


closed, 161 

closed geometric, 161 

geometric, 266, 297 

open, 161 
reduction of order, 228 
refinement, 162 

of partition, 28 
reflexive, 564,573 
reflexive Banach space, 541 
region under a graph, 291 
regular, 105, 236, 447 
regular Borel complex measure, 514 
regular Borel measure, 299, 316, 318, 490 
regular Borel signed measure, 512 
regular singular point, 221 
relation, 556 

equivalence, 564 

function, 556 

set, 473 
relative topology, 446 
relatively dense, 131 
restriction, 558 
Riemann integrable, 27,42, 162 
Riemann integral, 27, 42, 162 
Riemann sum, 27, 39, 42, 162, 163 
Riemann zeta function, 390 
Riemann—Lebesgue Lemma, 67, 335, 380 
Riesz Convexity Theorem, 427 
Riesz, Frigyes and Marcel, 427 
Riesz Representation Theorem, 425, 490, 528 
Riesz’s Lemma, 358 
Riesz—Fischer Theorem, 338, 383 
ring of sets, 233 
Rising Sun Lemma, 358 
root, 568 
Russell paradox, 553 


saltus function, 366 

Sard’s Theorem, 324 

sawtooth function, 65 

scalar, 87,279 

scalar-valued nonnegative function, 274 
Schroeder—Bernstein Theorem, 578 
Schwartz function, 385 

Schwartz space, 385 

Schwarz inequality, 75, 84, 90, 526, 564 
second axiom of countability, 450 
second countable, 450 

section, 267 
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self-adjoint, 536 
semidefinite, 90 
seminorm, 279 
separable, 107, 446 
separate points, 124, 462 
separation of variables, 227 
separation properties, 105 
sequence, 4, 97 
Cauchy, 9, 112 
convergent, 5 
doubly infinite, 5 
monotone, 5 
space, 87 
series 
binomial, 52, 56 
Fourier (see Fourier series) 
Fourier-Stieltjes, 349 
power, 44, 46 
Taylor, 46 
trigonometric, 61 
set, 554 
set function, 234 
additive, 234 
bounded additive, 418 
completely additive, 234 
countably additive, 234 
monotone, 494 
nonnegative, 234 
purely finitely additive, 438 
o-finite completely additive, 237 
set theory, 553 
Zermelo—Fraenkel, 554 
side, 266 
o-algebra of sets, 233 
o-bounded set, 490 
o-compact, 457 
o-finite completely additive set function, 237 
o-finite measure space, 238 
o-ring of sets, 233 
signed measure, 418 
regular Borel, 512 
Silverman—Toeplitz summability method, 80 
simple function, 241 
canonical expansion, 241 
simple ordering, 573 
sine, 47 
singleton, 555 
singular measure, 423 
singular point 
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irregular, 227 
regular, 221 
singular Stieltjes measure, 368 
smallest closed vector subspace, 281 
smooth function, 142 
smooth vector field, 199 
solution, 183, 188 
sphere, 126 
area, 326 
spherical coordinates, 325 
standard basis, 566 
standard Cantor set, 118 
Stieltjes measure, 339 
absolutely continuous, 369 
associated, 344 
singular, 368 
Stone—Weierstrass Theorem, 124, 132, 480 
strong differentiation, 431, 438 
strong type, 429 
subalgebra, 124 
subcover, 451 
subdivision point, 26 
sublinear operator, 429 
subnet, 468 
subordinate to a cover, 151 
subsequence, 5 
subspace, 102, 104 
closed, 102 
metric, 102, 104 
open, 102 
smallest closed vector, 281 
topological, 446 
vector, 104 
summability, 53 
Abel, 54 
Cesaro, 53 
Silverman—Toeplitz, 80 
support, 172, 300, 488 
supremum, 2 
essential, 280 
norm, 281 
symmetric, 89, 564 
symmetric difference, 232 


system of ordinary differential equations, 188 


constant coefficients, 211 
existence theorem, 189, 229 
uniqueness theorem, 193 


T;, 105, 447 
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Tauberian theorem, 55 
Taylor series, 46 

infinite, 44 
Taylor’s Theorem, 43, 146 
tends, 5 
Tietze Extension theorem, 483 
topological space, 442 
topological subspace, 446 
topology, 442 

discrete, 443 

finest, 445 

half-open interval, 451 

metric, 442 

order, 484 

product, 443, 458 

quotient, 443, 445,471 

relative, 446 

upper, 481 

weak, 443 
total differential, 153 
total ordering, 484, 573 
total variation, 355,514 
total-variation norm, 514 
totally bounded, 113 
trace of matrix, 149 
transfinite induction, 293 
transitive, 564, 573 
translation, 303 

in L?, 309,417 
triangle inequality, 41, 76, 83, 90, 279, 563 
trigonometric series, 61 
trivial ultrafilter, 294 
Tychonoff Product Theorem, 460, 470 
Tychonoff’s Lemma, 452 
type, 429 

strong, 429 

weak, 429 


ultrafilter, 294 

trivial, 294 
ultrametric inequality, 133 
unbounded interval, 3 
Uniform Boundedness Theorem, 543 
uniform Cauchy criterion, 18 
uniform continuity, 111 
uniform convergence, 17,99 
uniform limit, 17 
uniform metric, 87 
uniform norm, 281 


uniformly bounded, 22, 99, 121,478 
uniformly Cauchy, 18 
uniformly continuous function, 11 
uniformly equicontinuous, 22, 121 
uniformly integrable, 292 
union, 555 
unique factorization, 571 
uniqueness theorem 
Fourier series, 73, 77, 338 
integral curves, 199 
system of ordinary differential equations, 
193 
unit disk, 126, 181 
harmonic function, 227, 518 
Poisson integral formula, 354, 392, 437 
Poisson kernel, 392 
unit sphere, 126 
universal net, 468 
unordered pair, 555 
upper bound, 2, 573 
least, 2 
upper Riemann integral, 27, 162 
upper Riemann sum, 27, 162 
upper semicontinuous, 481 
upper topology, 481 
Urysohn Metrization Theorem, 476 
Urysohn’s Lemma, 474 


vanish at infinity, 509 
variation 
bounded, 346, 355 
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weak topology, 443 
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weak-star convergence, 284, 355, 405, 437 
weak-star topology of dual of normed 
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Weierstrass Approximation Theorem, 60 
Weierstrass M test, 20 
well ordered, 573,576 
Wiener’s Covering Lemma, 331 
Wronskian determinant, 205 
Wronskian matrix, 205 


Young’s inequality, 428 


Zermelo’s well-ordering theorem, 576 
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zeta function, 390 
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